scholarly journals Performance Comparison of Machine Learning Algorithms for Estimating the Soil Salinity of Salt-Affected Soil Using Field Spectral Data

2019 ◽  
Vol 11 (22) ◽  
pp. 2605 ◽  
Author(s):  
Wang ◽  
Chen ◽  
Wang ◽  
Li

Salt-affected soil is a prominent ecological and environmental problem in dry farming areas throughout the world. China has nearly 9.9 million km2 of salt-affected land. The identification, monitoring, and utilization of soil salinization have become important research topics for promoting sustainable progress. In this paper, using field-measured spectral data and soil salinity parameter data, through analysis and transformation of spectral data, five machine learning models, namely, random forest regression (RFR), support vector regression (SVR), gradient-boosted regression tree (GBRT), multilayer perceptron regression (MLPR), and least angle regression (Lars) are compared. The following performance measures of each model were evaluated: the collinear problems, handling data noise, stability, and the accuracy. In terms of these four aspects, the performance of each model on estimating soil salinity is evaluated. The results demonstrate that among the five models, RFR has the best performance in dealing with collinearity, RFR and MLPR have the best performance in dealing with data noise, and the SVR model is the most stable. The Lars model has the highest accuracy, with a determination coefficient (R2) of 0.87, ratio of performance to deviation (RPD) of 2.67, root mean square error (RMSE) of 0.18, and mean absolute percentage error (MAPE) of 0.11. Then, the comprehensive comparison and analysis of the five models are carried out, and it is found that the comprehensive performance of RFR model is the best; hence, this method is most suitable for estimating soil salinity using hyperspectral data. This study can provide a reference for the selection of regression methods in subsequent studies on estimating soil salinity using hyperspectral data.

2020 ◽  
Vol 9 (9) ◽  
pp. 507
Author(s):  
Sanjiwana Arjasakusuma ◽  
Sandiaga Swahyu Kusuma ◽  
Stuart Phinn

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.


2021 ◽  
Vol 13 (23) ◽  
pp. 4825
Author(s):  
Salman Naimi ◽  
Shamsollah Ayoubi ◽  
Mojtaba Zeraatpisheh ◽  
Jose Alexandre Melo Dematte

Soil salinization is a severe danger to agricultural activity in arid and semi-arid areas, reducing crop production and contributing to land destruction. This investigation aimed to utilize machine learning algorithms to predict spatial soil salinity (dS m−1) by combining environmental covariates derived from remotely sensed (RS) data, a digital elevation model (DEM), and proximal sensing (PS). The study is located in an arid region, southern Iran (52°51′–53°02′E; 28°16′–28°29′N), in which we collected 300 surface soil samples and acquired the spectral data with RS (Sentinel-2) and PS (electromagnetic induction instrument (EMI) and portable X-ray fluorescence (pXRF)). Afterward, we analyzed the data using five machine learning methods as follows: random forest—RF, k-nearest neighbors—kNN, support vector machines—SVM, partial least squares regression—PLSR, artificial neural networks—ANN, and the ensemble of individual models. To estimate the electrical conductivity of the saturated paste extract (ECe), we built three scenarios, including Scenario (1): Synthetic Soil Image (SySI) bands and salinity indices derived from it; Scenario (2): RS data, PS data, topographic attributes, and geology and geomorphology maps; and Scenario (3): the combination of Scenarios (1) and (2). The best prediction accuracy was obtained for the RF model in Scenario (3) (R2 = 0.48 and RMSE = 2.49), followed by Scenario (2) (RF model, R2 = 0.47 and RMSE = 2.50) and Scenario (1) for the SVM model (R2 = 0.26 and RMSE = 2.97). According to ensemble modeling, a combined strategy with the five models exceeded the performance of all the single ones and predicted soil salinity in all scenarios. The results revealed that the ensemble modeling method had higher reliability and more accurate predictive soil salinity than the individual approach. Relative improvement (RI%) showed that the R2 index in the ensemble model improved compared to the most precise prediction for the Scenarios (1), (2), and (3) with 120.95%, 56.82%, and 66.71%, respectively. We applied the best model in each scenario for mapping the soil salinity in the selected area, which indicated that ECe tended to increase from the northwestern to south and southeastern regions. The area with high ECe was located in the regions that mainly had low elevations and playa. The areas with low ECe were located in the higher elevations with steeper slopes and alluvial fans, and thus, relief had great importance. This study provides a precise, cost-effective, and scientific base prediction for decision-making purposes to map soil salinity in arid regions.


2021 ◽  
Vol 11 (22) ◽  
pp. 10628
Author(s):  
John Chauvin ◽  
Ray Duran ◽  
Kouhyar Tavakolian ◽  
Alireza Akhbardeh ◽  
Nicholas MacKinnon ◽  
...  

Relative to standard red/green/blue (RGB) imaging systems, hyperspectral imaging systems offer superior capabilities but tend to be expensive and complex, requiring either a mechanically complex push-broom line scanning method, a tunable filter, or a large set of light emitting diodes (LEDs) to collect images in multiple wavelengths. This paper proposes a new methodology to support the design of a hypothesized system that uses three imaging modes—fluorescence, visible/near-infrared (VNIR) reflectance, and shortwave infrared (SWIR) reflectance—to capture narrow-band spectral data at only three to seven narrow wavelengths. Simulated annealing is applied to identify the optimal wavelengths for sparse spectral measurement with a cost function based on the accuracy provided by a weighted k-nearest neighbors (WKNN) classifier, a common and relatively robust machine learning classifier. Two separate classification approaches are presented, the first using a multi-layer perceptron (MLP) artificial neural network trained on sparse data from the three individual spectra and the second using a fusion of the data from all three spectra. The results are compared with those from four alternative classifiers based on common machine learning algorithms. To validate the proposed methodology, reflectance and fluorescence spectra in these three spectroscopic modes were collected from fish fillets and used to classify the fillets by species. Accuracies determined from the two classification approaches are compared with benchmark values derived by training the classifiers with the full resolution spectral data. The results of the single-layer classification study show accuracies ranging from ~68% for SWIR reflectance to ~90% for fluorescence with just seven wavelengths. The results of the fusion classification study show accuracies of about 95% with seven wavelengths and more than 90% even with just three wavelengths. Reducing the number of required wavelengths facilitates the creation of rapid and cost-effective spectral imaging systems that can be used for widespread analysis in food monitoring/food fraud, agricultural, and biomedical applications.


Materials ◽  
2019 ◽  
Vol 12 (9) ◽  
pp. 1475 ◽  
Author(s):  
Safwan Altarazi ◽  
Rula Allaf ◽  
Firas Alhindawi

In this study, machine learning algorithms (MLA) were employed to predict and classify the tensile strength of polymeric films of different compositions as a function of processing conditions. Two film production techniques were investigated, namely compression molding and extrusion-blow molding. Multi-factor experiments were designed with corresponding parameters. A tensile test was conducted on samples and the tensile strength was recorded. Predictive and classification models from nine MLA were developed. Performance analysis demonstrated the superior predictive ability of the support vector machine (SVM) algorithm, in which a coefficient of determination and mean absolute percentage error of 96% and 4%, respectively were obtained for the extrusion-blow molded films. The classification performance of the MLA was also evaluated, with several algorithms exhibiting excellent performance.


2018 ◽  
Vol 2018 ◽  
pp. 1-12 ◽  
Author(s):  
Wenlong Jing ◽  
Xia Zhou ◽  
Chen Zhang ◽  
Chongyang Wang ◽  
Hao Jiang

Hyperspectral sensors provide detailed information for dust retention content (DRC) estimation. However, rich hyperspectral data are not fully utilized by traditional image analysis techniques. We integrated several recently developed machine learning algorithms to estimate DRC on plant leaves using the spectra measured by the ASD FieldSpec 3. The experiments were carried out on three common green plants of southern China. The important hyperspectral variables were first identified by applying the random forest (RF) algorithm. Three estimation models were then developed using the support vector machine (SVM), classification and regression tree (CART), and RF algorithms. The results showed that the increase in dust retention contents on plant leaves enhanced their reflectance in the visible wavelength but weakened their reflectance in the infrared wavelength. Wavelengths in the ranges of 450–500 nm, 550–600 nm, 750–1000 nm, and 1100–1300 nm were identified as important variables using the RF algorithm and were used to estimate the DRC. The comparison of the three machine learning techniques for DRC estimation confirmed that the SVM and RF models performed well because their estimations were similar to the measured DRC. Specifically, the average R2 for SVM and RF model are 0.85 and 0.88. The technical approach of this study proved to be a successful illustration of using hyperspectral measurements to estimate the DRC on plant leaves. The findings of this study can be applied to monitor the DRC on leaves of other plants and can also be integrated with other types of spectral data to measure the DRC at a regional scale.


2018 ◽  
Vol 58 (8) ◽  
pp. 1488 ◽  
Author(s):  
S. Rahman ◽  
P. Quin ◽  
T. Walsh ◽  
T. Vidal-Calleja ◽  
M. J. McPhee ◽  
...  

The objectives of the present study were to describe the approach used for classifying surface tissue, and for estimating fat depth in lamb short loins and validating the approach. Fat versus non-fat pixels were classified and then used to estimate the fat depth for each pixel in the hyperspectral image. Estimated reflectance, instead of image intensity or radiance, was used as the input feature for classification. The relationship between reflectance and the fat/non-fat classification label was learnt using support vector machines. Gaussian processes were used to learn regression for fat depth as a function of reflectance. Data to train and test the machine learning algorithms was collected by scanning 16 short loins. The near-infrared hyperspectral camera captured lines of data of the side of the short loin (i.e. with the subcutaneous fat facing the camera). Advanced single-lens reflex camera took photos of the same cuts from above, such that a ground truth of fat depth could be semi-automatically extracted and associated with the hyperspectral data. A subset of the data was used to train the machine learning model, and to test it. The results of classifying pixels as either fat or non-fat achieved a 96% accuracy. Fat depths of up to 12 mm were estimated, with an R2 of 0.59, a mean absolute bias of 1.72 mm and root mean square error of 2.34 mm. The techniques developed and validated in the present study will be used to estimate fat coverage to predict total fat, and, subsequently, lean meat yield in the carcass.


2021 ◽  
Vol 13 (17) ◽  
pp. 3459
Author(s):  
Joanna Pranga ◽  
Irene Borra-Serrano ◽  
Jonas Aper ◽  
Tom De Swaef ◽  
An Ghesquiere ◽  
...  

High-throughput field phenotyping using close remote sensing platforms and sensors for non-destructive assessment of plant traits can support the objective evaluation of yield predictions of large breeding trials. The main objective of this study was to examine the potential of unmanned aerial vehicle (UAV)-based structural and spectral features and their combination in herbage yield predictions across diploid and tetraploid varieties and breeding populations of perennial ryegrass (Lolium perenne L.). Canopy structural (i.e., canopy height) and spectral (i.e., vegetation indices) information were derived from data gathered with two sensors: a consumer-grade RGB and a 10-band multispectral (MS) camera system, which were compared in the analysis. A total of 468 field plots comprising 115 diploid and 112 tetraploid varieties and populations were considered in this study. A modelling framework established to predict dry matter yield (DMY), was used to test three machine learning algorithms, including Partial Least Squares Regression (PLSR), Random Forest (RF), and Support Vector Machines (SVM). The results of the nested cross-validation revealed: (a) the fusion of structural and spectral features achieved better DMY estimates as compared to models fitted with structural or spectral data only, irrespective of the sensor, ploidy level or machine learning algorithm applied; (b) models built with MS-based predictor variables, despite their lower spatial resolution, slightly outperformed the RGB-based models, as lower mean relative root mean square error (rRMSE) values were delivered; and (c) on average, the RF technique reported the best model performances among tested algorithms, regardless of the dataset used. The approach introduced in this study can provide accurate yield estimates (up to an RMSE = 308 kg ha−1) and useful information for breeders and practical farm-scale applications.


Author(s):  
Mikail Purlu ◽  
Belgin Emre Turkay

Many approaches about the planning and operation of power systems, such as network reconfiguration and distributed generation (DG), have been proposed to overcome the challenges caused by the increase in electricity consumption. Besides the positive effects on the grid, contributions on environmental pollution and other advantages, the rapid developments in renewable energy technologies have made the DG resources an important issue, however, improper DG allocation may result in network damages. A lot of studies have been practised with analytical and heuristic methods based on load flow for optimal DG integration to the network. This novel method based on estimation is proposed to determine the size of DG and its effects on the network to get rid of the coercive and time-consuming load flow techniques. Machine learning algorithms, such as Linear Regression, Artificial Neural Network, Support Vector Regression, K-Nearest Neighbor, and Decision Tree, have been used for the estimations and have been applied to well-known test systems, such as IEEE 12-bus, 33-bus, and 69-bus distribution systems. The accuracy of the proposed estimation methods has been verified with R-squared and mean absolute percentage error. Results show that the proposed DG allocation method is effective, applicable, and flexible.


Geotechnics ◽  
2021 ◽  
Vol 1 (2) ◽  
pp. 534-557
Author(s):  
Sivapalan Gajan

The objective of this study is to develop data-driven predictive models for seismic energy dissipation of rocking shallow foundations during earthquake loading using multiple machine learning (ML) algorithms and experimental data from a rocking foundations database. Three nonlinear, nonparametric ML algorithms are considered: k-nearest neighbors regression (KNN), support vector regression (SVR) and decision tree regression (DTR). The input features to ML algorithms include critical contact area ratio, slenderness ratio and rocking coefficient of rocking system, and peak ground acceleration and Arias intensity of earthquake motion. A randomly split pair of training and testing datasets is used for initial evaluation of the models and hyperparameter tuning. Repeated k-fold cross validation technique is used to further evaluate the performance of ML models in terms of bias and variance using mean absolute percentage error. It is found that all three ML models perform better than multivariate linear regression model, and that both KNN and SVR models consistently outperform DTR model. On average, the accuracy of KNN model is about 16% higher than that of SVR model, while the variance of SVR model is about 27% smaller than that of KNN model, making them both excellent candidates for modeling the problem considered.


2020 ◽  
Vol 12 (2) ◽  
pp. 84-99
Author(s):  
Li-Pang Chen

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.


Sign in / Sign up

Export Citation Format

Share Document