Performance Comparison of Machine Learning Algorithms for Estimating the Soil Salinity of Salt-Affected Soil Using Field Spectral Data

Wang;  Chen;  Wang;  Li

doi:10.3390/rs11222605

Performance Comparison of Machine Learning Algorithms for Estimating the Soil Salinity of Salt-Affected Soil Using Field Spectral Data

Remote Sensing ◽

10.3390/rs11222605 ◽

2019 ◽

Vol 11 (22) ◽

pp. 2605 ◽

Cited By ~ 3

Author(s):

Wang ◽

Chen ◽

Wang ◽

Keyword(s):

Machine Learning ◽

Spectral Data ◽

Soil Salinity ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Percentage Error ◽

Support Vector ◽

Boosted Regression Tree ◽

Data Noise ◽

Salt Affected Soil

Salt-affected soil is a prominent ecological and environmental problem in dry farming areas throughout the world. China has nearly 9.9 million km2 of salt-affected land. The identification, monitoring, and utilization of soil salinization have become important research topics for promoting sustainable progress. In this paper, using field-measured spectral data and soil salinity parameter data, through analysis and transformation of spectral data, five machine learning models, namely, random forest regression (RFR), support vector regression (SVR), gradient-boosted regression tree (GBRT), multilayer perceptron regression (MLPR), and least angle regression (Lars) are compared. The following performance measures of each model were evaluated: the collinear problems, handling data noise, stability, and the accuracy. In terms of these four aspects, the performance of each model on estimating soil salinity is evaluated. The results demonstrate that among the five models, RFR has the best performance in dealing with collinearity, RFR and MLPR have the best performance in dealing with data noise, and the SVR model is the most stable. The Lars model has the highest accuracy, with a determination coefficient (R2) of 0.87, ratio of performance to deviation (RPD) of 2.67, root mean square error (RMSE) of 0.18, and mean absolute percentage error (MAPE) of 0.11. Then, the comprehensive comparison and analysis of the five models are carried out, and it is found that the comprehensive performance of RFR model is the best; hence, this method is most suitable for estimating soil salinity using hyperspectral data. This study can provide a reference for the selection of regression methods in subsequent studies on estimating soil salinity using hyperspectral data.

Get full-text (via PubEx)

Evaluating Variable Selection and Machine Learning Algorithms for Estimating Forest Heights by Combining Lidar and Hyperspectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9090507 ◽

2020 ◽

Vol 9 (9) ◽

pp. 507

Author(s):

Sanjiwana Arjasakusuma ◽

Sandiaga Swahyu Kusuma ◽

Stuart Phinn

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Learning Algorithms ◽

Principal Component ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Support Vector ◽

Forest Height ◽

Extreme Gradient Boosting

Machine learning has been employed for various mapping and modeling tasks using input variables from different sources of remote sensing data. For feature selection involving high- spatial and spectral dimensionality data, various methods have been developed and incorporated into the machine learning framework to ensure an efficient and optimal computational process. This research aims to assess the accuracy of various feature selection and machine learning methods for estimating forest height using AISA (airborne imaging spectrometer for applications) hyperspectral bands (479 bands) and airborne light detection and ranging (lidar) height metrics (36 metrics), alone and combined. Feature selection and dimensionality reduction using Boruta (BO), principal component analysis (PCA), simulated annealing (SA), and genetic algorithm (GA) in combination with machine learning algorithms such as multivariate adaptive regression spline (MARS), extra trees (ET), support vector regression (SVR) with radial basis function, and extreme gradient boosting (XGB) with trees (XGbtree and XGBdart) and linear (XGBlin) classifiers were evaluated. The results demonstrated that the combinations of BO-XGBdart and BO-SVR delivered the best model performance for estimating tropical forest height by combining lidar and hyperspectral data, with R2 = 0.53 and RMSE = 1.7 m (18.4% of nRMSE and 0.046 m of bias) for BO-XGBdart and R2 = 0.51 and RMSE = 1.8 m (15.8% of nRMSE and −0.244 m of bias) for BO-SVR. Our study also demonstrated the effectiveness of BO for variables selection; it could reduce 95% of the data to select the 29 most important variables from the initial 516 variables from lidar metrics and hyperspectral data.

Get full-text (via PubEx)

Ground Observations and Environmental Covariates Integration for Mapping of Soil Salinity: A Machine Learning-Based Approach

Remote Sensing ◽

10.3390/rs13234825 ◽

2021 ◽

Vol 13 (23) ◽

pp. 4825

Author(s):

Salman Naimi ◽

Shamsollah Ayoubi ◽

Mojtaba Zeraatpisheh ◽

Jose Alexandre Melo Dematte

Keyword(s):

Machine Learning ◽

Soil Salinity ◽

Crop Production ◽

Soil Salinization ◽

Machine Learning Algorithms ◽

Support Vector ◽

Agricultural Activity ◽

Ensemble Modeling ◽

Least Squares Regression ◽

Environmental Covariates

Soil salinization is a severe danger to agricultural activity in arid and semi-arid areas, reducing crop production and contributing to land destruction. This investigation aimed to utilize machine learning algorithms to predict spatial soil salinity (dS m−1) by combining environmental covariates derived from remotely sensed (RS) data, a digital elevation model (DEM), and proximal sensing (PS). The study is located in an arid region, southern Iran (52°51′–53°02′E; 28°16′–28°29′N), in which we collected 300 surface soil samples and acquired the spectral data with RS (Sentinel-2) and PS (electromagnetic induction instrument (EMI) and portable X-ray fluorescence (pXRF)). Afterward, we analyzed the data using five machine learning methods as follows: random forest—RF, k-nearest neighbors—kNN, support vector machines—SVM, partial least squares regression—PLSR, artificial neural networks—ANN, and the ensemble of individual models. To estimate the electrical conductivity of the saturated paste extract (ECe), we built three scenarios, including Scenario (1): Synthetic Soil Image (SySI) bands and salinity indices derived from it; Scenario (2): RS data, PS data, topographic attributes, and geology and geomorphology maps; and Scenario (3): the combination of Scenarios (1) and (2). The best prediction accuracy was obtained for the RF model in Scenario (3) (R2 = 0.48 and RMSE = 2.49), followed by Scenario (2) (RF model, R2 = 0.47 and RMSE = 2.50) and Scenario (1) for the SVM model (R2 = 0.26 and RMSE = 2.97). According to ensemble modeling, a combined strategy with the five models exceeded the performance of all the single ones and predicted soil salinity in all scenarios. The results revealed that the ensemble modeling method had higher reliability and more accurate predictive soil salinity than the individual approach. Relative improvement (RI%) showed that the R2 index in the ensemble model improved compared to the most precise prediction for the Scenarios (1), (2), and (3) with 120.95%, 56.82%, and 66.71%, respectively. We applied the best model in each scenario for mapping the soil salinity in the selected area, which indicated that ECe tended to increase from the northwestern to south and southeastern regions. The area with high ECe was located in the regions that mainly had low elevations and playa. The areas with low ECe were located in the higher elevations with steeper slopes and alluvial fans, and thus, relief had great importance. This study provides a precise, cost-effective, and scientific base prediction for decision-making purposes to map soil salinity in arid regions.

Get full-text (via PubEx)

Simulated Annealing-Based Hyperspectral Data Optimization for Fish Species Classification: Can the Number of Measured Wavelengths Be Reduced?

Applied Sciences ◽

10.3390/app112210628 ◽

2021 ◽

Vol 11 (22) ◽

pp. 10628

Author(s):

John Chauvin ◽

Ray Duran ◽

Kouhyar Tavakolian ◽

Alireza Akhbardeh ◽

Nicholas MacKinnon ◽

...

Keyword(s):

Machine Learning ◽

Simulated Annealing ◽

Spectral Data ◽

Near Infrared ◽

Tunable Filter ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Imaging Systems ◽

Large Set ◽

Scanning Method

Relative to standard red/green/blue (RGB) imaging systems, hyperspectral imaging systems offer superior capabilities but tend to be expensive and complex, requiring either a mechanically complex push-broom line scanning method, a tunable filter, or a large set of light emitting diodes (LEDs) to collect images in multiple wavelengths. This paper proposes a new methodology to support the design of a hypothesized system that uses three imaging modes—fluorescence, visible/near-infrared (VNIR) reflectance, and shortwave infrared (SWIR) reflectance—to capture narrow-band spectral data at only three to seven narrow wavelengths. Simulated annealing is applied to identify the optimal wavelengths for sparse spectral measurement with a cost function based on the accuracy provided by a weighted k-nearest neighbors (WKNN) classifier, a common and relatively robust machine learning classifier. Two separate classification approaches are presented, the first using a multi-layer perceptron (MLP) artificial neural network trained on sparse data from the three individual spectra and the second using a fusion of the data from all three spectra. The results are compared with those from four alternative classifiers based on common machine learning algorithms. To validate the proposed methodology, reflectance and fluorescence spectra in these three spectroscopic modes were collected from fish fillets and used to classify the fillets by species. Accuracies determined from the two classification approaches are compared with benchmark values derived by training the classifiers with the full resolution spectral data. The results of the single-layer classification study show accuracies ranging from ~68% for SWIR reflectance to ~90% for fluorescence with just seven wavelengths. The results of the fusion classification study show accuracies of about 95% with seven wavelengths and more than 90% even with just three wavelengths. Reducing the number of required wavelengths facilitates the creation of rapid and cost-effective spectral imaging systems that can be used for widespread analysis in food monitoring/food fraud, agricultural, and biomedical applications.

Get full-text (via PubEx)

Machine Learning Models for Predicting and Classifying the Tensile Strength of Polymeric Films Fabricated via Different Production Processes

Materials ◽

10.3390/ma12091475 ◽

2019 ◽

Vol 12 (9) ◽

pp. 1475 ◽

Cited By ~ 3

Author(s):

Safwan Altarazi ◽

Rula Allaf ◽

Firas Alhindawi

Keyword(s):

Machine Learning ◽

Tensile Strength ◽

Predictive Ability ◽

Classification Performance ◽

Machine Learning Algorithms ◽

Polymeric Films ◽

Coefficient Of Determination ◽

Percentage Error ◽

Support Vector ◽

Extrusion Blow Molding

In this study, machine learning algorithms (MLA) were employed to predict and classify the tensile strength of polymeric films of different compositions as a function of processing conditions. Two film production techniques were investigated, namely compression molding and extrusion-blow molding. Multi-factor experiments were designed with corresponding parameters. A tensile test was conducted on samples and the tensile strength was recorded. Predictive and classification models from nine MLA were developed. Performance analysis demonstrated the superior predictive ability of the support vector machine (SVM) algorithm, in which a coefficient of determination and mean absolute percentage error of 96% and 4%, respectively were obtained for the extrusion-blow molded films. The classification performance of the MLA was also evaluated, with several algorithms exhibiting excellent performance.

Get full-text (via PubEx)

Machine Learning for Estimating Leaf Dust Retention Based on Hyperspectral Measurements

Journal of Sensors ◽

10.1155/2018/6026259 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Wenlong Jing ◽

Xia Zhou ◽

Chen Zhang ◽

Chongyang Wang ◽

Hao Jiang

Keyword(s):

Machine Learning ◽

Southern China ◽

Regional Scale ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Classification And Regression Tree ◽

Machine Learning Techniques ◽

Support Vector ◽

Plant Leaves ◽

Dust Retention

Hyperspectral sensors provide detailed information for dust retention content (DRC) estimation. However, rich hyperspectral data are not fully utilized by traditional image analysis techniques. We integrated several recently developed machine learning algorithms to estimate DRC on plant leaves using the spectra measured by the ASD FieldSpec 3. The experiments were carried out on three common green plants of southern China. The important hyperspectral variables were first identified by applying the random forest (RF) algorithm. Three estimation models were then developed using the support vector machine (SVM), classification and regression tree (CART), and RF algorithms. The results showed that the increase in dust retention contents on plant leaves enhanced their reflectance in the visible wavelength but weakened their reflectance in the infrared wavelength. Wavelengths in the ranges of 450–500 nm, 550–600 nm, 750–1000 nm, and 1100–1300 nm were identified as important variables using the RF algorithm and were used to estimate the DRC. The comparison of the three machine learning techniques for DRC estimation confirmed that the SVM and RF models performed well because their estimations were similar to the measured DRC. Specifically, the average R2 for SVM and RF model are 0.85 and 0.88. The technical approach of this study proved to be a successful illustration of using hyperspectral measurements to estimate the DRC on plant leaves. The findings of this study can be applied to monitor the DRC on leaves of other plants and can also be integrated with other types of spectral data to measure the DRC at a regional scale.

Get full-text (via PubEx)

Preliminary estimation of fat depth in the lamb short loin using a hyperspectral camera

Animal Production Science ◽

10.1071/an17795 ◽

2018 ◽

Vol 58 (8) ◽

pp. 1488 ◽

Cited By ~ 2

Author(s):

S. Rahman ◽

P. Quin ◽

T. Walsh ◽

T. Vidal-Calleja ◽

M. J. McPhee ◽

...

Keyword(s):

Machine Learning ◽

Near Infrared ◽

Hyperspectral Image ◽

Subcutaneous Fat ◽

Ground Truth ◽

Hyperspectral Data ◽

Machine Learning Algorithms ◽

Support Vector ◽

Total Fat ◽

Hyperspectral Camera

The objectives of the present study were to describe the approach used for classifying surface tissue, and for estimating fat depth in lamb short loins and validating the approach. Fat versus non-fat pixels were classified and then used to estimate the fat depth for each pixel in the hyperspectral image. Estimated reflectance, instead of image intensity or radiance, was used as the input feature for classification. The relationship between reflectance and the fat/non-fat classification label was learnt using support vector machines. Gaussian processes were used to learn regression for fat depth as a function of reflectance. Data to train and test the machine learning algorithms was collected by scanning 16 short loins. The near-infrared hyperspectral camera captured lines of data of the side of the short loin (i.e. with the subcutaneous fat facing the camera). Advanced single-lens reflex camera took photos of the same cuts from above, such that a ground truth of fat depth could be semi-automatically extracted and associated with the hyperspectral data. A subset of the data was used to train the machine learning model, and to test it. The results of classifying pixels as either fat or non-fat achieved a 96% accuracy. Fat depths of up to 12 mm were estimated, with an R2 of 0.59, a mean absolute bias of 1.72 mm and root mean square error of 2.34 mm. The techniques developed and validated in the present study will be used to estimate fat coverage to predict total fat, and, subsequently, lean meat yield in the carcass.

Get full-text (via PubEx)

Improving Accuracy of Herbage Yield Predictions in Perennial Ryegrass with UAV-Based Structural and Spectral Data Fusion and Machine Learning

Remote Sensing ◽

10.3390/rs13173459 ◽

2021 ◽

Vol 13 (17) ◽

pp. 3459

Author(s):

Joanna Pranga ◽

Irene Borra-Serrano ◽

Jonas Aper ◽

Tom De Swaef ◽

An Ghesquiere ◽

...

Keyword(s):

Machine Learning ◽

Spectral Data ◽

Perennial Ryegrass ◽

Plant Traits ◽

Learning Algorithm ◽

Machine Learning Algorithms ◽

Support Vector ◽

Spectral Features ◽

Least Squares Regression ◽

Herbage Yield

High-throughput field phenotyping using close remote sensing platforms and sensors for non-destructive assessment of plant traits can support the objective evaluation of yield predictions of large breeding trials. The main objective of this study was to examine the potential of unmanned aerial vehicle (UAV)-based structural and spectral features and their combination in herbage yield predictions across diploid and tetraploid varieties and breeding populations of perennial ryegrass (Lolium perenne L.). Canopy structural (i.e., canopy height) and spectral (i.e., vegetation indices) information were derived from data gathered with two sensors: a consumer-grade RGB and a 10-band multispectral (MS) camera system, which were compared in the analysis. A total of 468 field plots comprising 115 diploid and 112 tetraploid varieties and populations were considered in this study. A modelling framework established to predict dry matter yield (DMY), was used to test three machine learning algorithms, including Partial Least Squares Regression (PLSR), Random Forest (RF), and Support Vector Machines (SVM). The results of the nested cross-validation revealed: (a) the fusion of structural and spectral features achieved better DMY estimates as compared to models fitted with structural or spectral data only, irrespective of the sensor, ploidy level or machine learning algorithm applied; (b) models built with MS-based predictor variables, despite their lower spatial resolution, slightly outperformed the RGB-based models, as lower mean relative root mean square error (rRMSE) values were delivered; and (c) on average, the RF technique reported the best model performances among tested algorithms, regardless of the dataset used. The approach introduced in this study can provide accurate yield estimates (up to an RMSE = 308 kg ha−1) and useful information for breeders and practical farm-scale applications.

Get full-text (via PubEx)

Estimating the Distributed Generation Unit Sizing and Its Effects on the Distribution System by Using Machine Learning Methods

Elektronika ir Elektrotechnika ◽

10.5755/j02.eie.28864 ◽

2021 ◽

Author(s):

Mikail Purlu ◽

Belgin Emre Turkay

Keyword(s):

Machine Learning ◽

Distributed Generation ◽

Distribution System ◽

Distribution Systems ◽

Load Flow ◽

Machine Learning Algorithms ◽

Estimation Methods ◽

Percentage Error ◽

Support Vector ◽

Positive Effects

Many approaches about the planning and operation of power systems, such as network reconfiguration and distributed generation (DG), have been proposed to overcome the challenges caused by the increase in electricity consumption. Besides the positive effects on the grid, contributions on environmental pollution and other advantages, the rapid developments in renewable energy technologies have made the DG resources an important issue, however, improper DG allocation may result in network damages. A lot of studies have been practised with analytical and heuristic methods based on load flow for optimal DG integration to the network. This novel method based on estimation is proposed to determine the size of DG and its effects on the network to get rid of the coercive and time-consuming load flow techniques. Machine learning algorithms, such as Linear Regression, Artificial Neural Network, Support Vector Regression, K-Nearest Neighbor, and Decision Tree, have been used for the estimations and have been applied to well-known test systems, such as IEEE 12-bus, 33-bus, and 69-bus distribution systems. The accuracy of the proposed estimation methods has been verified with R-squared and mean absolute percentage error. Results show that the proposed DG allocation method is effective, applicable, and flexible.

Get full-text (via PubEx)

Modeling of Seismic Energy Dissipation of Rocking Foundations Using Nonparametric Machine Learning Algorithms

Geotechnics ◽

10.3390/geotechnics1020024 ◽

2021 ◽

Vol 1 (2) ◽

pp. 534-557

Author(s):

Sivapalan Gajan

Keyword(s):

Machine Learning ◽

Energy Dissipation ◽

Slenderness Ratio ◽

Seismic Energy ◽

Machine Learning Algorithms ◽

Percentage Error ◽

Support Vector ◽

K Nearest Neighbors ◽

Ground Acceleration ◽

Rocking Foundations

The objective of this study is to develop data-driven predictive models for seismic energy dissipation of rocking shallow foundations during earthquake loading using multiple machine learning (ML) algorithms and experimental data from a rocking foundations database. Three nonlinear, nonparametric ML algorithms are considered: k-nearest neighbors regression (KNN), support vector regression (SVR) and decision tree regression (DTR). The input features to ML algorithms include critical contact area ratio, slenderness ratio and rocking coefficient of rocking system, and peak ground acceleration and Arias intensity of earthquake motion. A randomly split pair of training and testing datasets is used for initial evaluation of the models and hyperparameter tuning. Repeated k-fold cross validation technique is used to further evaluate the performance of ML models in terms of bias and variance using mean absolute percentage error. It is found that all three ML models perform better than multivariate linear regression model, and that both KNN and SVR models consistently outperform DTR model. On average, the accuracy of KNN model is about 16% higher than that of SVR model, while the variance of SVR model is about 27% smaller than that of KNN model, making them both excellent candidates for modeling the problem considered.

Get full-text (via PubEx)

Using Machine Learning Algorithms on Prediction of Stock Price

Journal of Modeling and Optimization ◽

10.32732/jmo.2020.12.2.84 ◽

2020 ◽

Vol 12 (2) ◽

pp. 84-99

Author(s):

Li-Pang Chen

Keyword(s):

Machine Learning ◽

Stock Price ◽

Short Term Memory ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Support Vector ◽

Short Term ◽

Learning Techniques ◽

Historical Database ◽

Long Short Term Memory

In this paper, we investigate analysis and prediction of the time-dependent data. We focus our attention on four different stocks are selected from Yahoo Finance historical database. To build up models and predict the future stock price, we consider three different machine learning techniques including Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN) and Support Vector Regression (SVR). By treating close price, open price, daily low, daily high, adjusted close price, and volume of trades as predictors in machine learning methods, it can be shown that the prediction accuracy is improved.

Get full-text (via PubEx)