Analysis of automated estimation models using machine learning

Machine learning algorithms should be tested for use in quantitative precipitation estimation models of rain radar data in South Korea because such an application can provide a more accurate estimate of rainfall than the conventional ZR relationship-based model. The applicability of random forest, stochastic gradient boosted model, and extreme learning machine methods to quantitative precipitation estimation models was investigated using case studies with polarization radar data from Gwangdeoksan radar station. Various combinations of input variable sets were tested, and results showed that machine learning algorithms can be applied to build the quantitative precipitation estimation model of the polarization radar data in South Korea. The machine learning-based quantitative precipitation estimation models led to better performances than ZR relationship-based models, particularly for heavy rainfall events. The extreme learning machine is considered the best of the algorithms used based on evaluation criteria.

Download Full-text

Systematic literature review of machine learning based software development effort estimation models

Information and Software Technology ◽

10.1016/j.infsof.2011.09.002 ◽

2012 ◽

Vol 54 (1) ◽

pp. 41-59 ◽

Cited By ~ 184

Author(s):

Jianfeng Wen ◽

Shixian Li ◽

Zhiyong Lin ◽

Yong Hu ◽

Changqin Huang

Keyword(s):

Machine Learning ◽

Literature Review ◽

Software Development ◽

Systematic Literature Review ◽

Development Effort ◽

Effort Estimation ◽

Software Development Effort ◽

Estimation Models ◽

Software Development Effort Estimation

Download Full-text

Study on the Estimation of Forest Volume Based on Multi-Source Data

Sensors ◽

10.3390/s21237796 ◽

2021 ◽

Vol 21 (23) ◽

pp. 7796

Author(s):

Tao Hu ◽

Yuman Sun ◽

Weiwei Jia ◽

Dandan Li ◽

Maosheng Zou ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Remote Sensing ◽

Artificial Neural Network ◽

Random Forest ◽

Hybrid Model ◽

Prediction Accuracy ◽

Volume Estimation ◽

Support Vector ◽

Estimation Models

We performed a comparative analysis of the prediction accuracy of machine learning methods and ordinary Kriging (OK) hybrid methods for forest volume models based on multi-source remote sensing data combined with ground survey data. Taking Larix olgensis, Pinus koraiensis, and Pinus sylvestris plantations in Mengjiagang forest farms as the research object, based on the Chinese Academy of Forestry LiDAR, charge-coupled device, and hyperspectral (CAF-LiTCHy) integrated system, we extracted the visible vegetation index, texture features, terrain factors, and point cloud feature variables, respectively. Random forest (RF), support vector regression (SVR), and an artificial neural network (ANN) were used to estimate forest volume. In the small-scale space, the estimation of sample plot volume is influenced by the surrounding environment as well as the neighboring observed data. Based on the residuals of these three machine learning models, OK interpolation was applied to construct new hybrid forest volume estimation models called random forest Kriging (RFK), support vector machines for regression Kriging (SVRK), and artificial neural network Kriging (ANNK). The six estimation models of forest volume were tested using the leave-one-out (Loo) cross-validation method. The prediction accuracies of these six models are better, with RLoo2 values above 0.6, and the prediction accuracy values of the hybrid models are all improved to different extents. Among the six models, the RFK hybrid model had the best prediction effect, with an RLoo2 reaching 0.915. Therefore, the machine learning method based on multi-source remote sensing factors is useful for forest volume estimation; in particular, the hybrid model constructed by combining machine learning and the OK method greatly improved the accuracy of forest volume estimation, which, thus, provides a fast and effective method for the remote sensing inversion estimation of forest volume and facilitates the management of forest resources.

Download Full-text

Machine Learning Modeling of Forest Road Construction Costs

Forests ◽

10.3390/f12091169 ◽

2021 ◽

Vol 12 (9) ◽

pp. 1169

Author(s):

Abolfazl Jaafari ◽

Iman Pazhouhan ◽

Pete Bettinger

Keyword(s):

Machine Learning ◽

Real World ◽

Cost Estimation ◽

Road Construction ◽

Construction Costs ◽

Real World Data ◽

World Data ◽

The Real ◽

Estimation Models ◽

Cost Estimation Models

The economics of the forestry enterprise are largely measured by their performance in road construction and management. The construction of forest roads requires tremendous capital outlays and usually constitutes a major component of the construction industry. The availability of cost estimation models assisting in the early stages of a project would therefore be of great help for timely costing of alternatives and more economical solutions. This study describes the development and application of such cost estimation models. First, the main cost elements and variables affecting total construction costs were determined for which the real-world data were derived from the project bids and an analysis of 300 segments of a three kilometer road constructed in the Hyrcanian Forests of Iran. Then, five state-of-the-art machine learning methods, i.e., linear regression (LR), K-Star, multilayer perceptron neural network (MLP), support vector machine (SVM), and Instance-based learning (IBL) were applied to develop models that would estimate construction costs from the real-world data. The performance of the models was measured using the correlation coefficient (R), root mean square error (RMSE), and percent of relative error index (PREI). The results showed that the IBL model had the highest training performance (R = 0.998, RMSE = 1.4%), whereas the SVM model had the highest estimation capability (R = 0.993, RMSE = 2.44%). PREI indicated that all models but IBL (mean PREI = 0.0021%) slightly underestimated the construction costs. Despite these few differences, the results demonstrated that the cost estimations developed here were consistent with the project bids, and our models thus can serve as a guideline for better allocating financial resources in the early stages of the bidding process.

Download Full-text

Leaf Area Index Estimation Algorithm for GF-5 Hyperspectral Data Based on Different Feature Selection and Machine Learning Methods

Remote Sensing ◽

10.3390/rs12132110 ◽

2020 ◽

Vol 12 (13) ◽

pp. 2110

Author(s):

Zhulin Chen ◽

Kun Jia ◽

Chenchao Xiao ◽

Dandan Wei ◽

Xiang Zhao ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Leaf Area Index ◽

Leaf Area ◽

Energy Utilization ◽

Estimation Accuracy ◽

Soil Conditions ◽

Area Index ◽

Estimation Models

Leaf area index (LAI) is an essential vegetation parameter that represents the light energy utilization and vegetation canopy structure. As the only in-operation hyperspectral satellite launched by China, GF-5 is potentially useful for accurate LAI estimation. However, there is no research focus on evaluating GF-5 data for LAI estimation. Hyperspectral remote sensing data contains abundant information about the reflective characteristics of vegetation canopies, but these abound data also easily result in a dimensionality curse. Therefore, feature selection (FS) is necessary to reduce data redundancy to achieve more reliable estimations. Currently, machine learning (ML) algorithms have been widely used for FS. Moreover, the same ML algorithm is usually conducted for both FS and regression in LAI estimation. However, no evidence suggests that this is the optimal solution. Therefore, this study focuses on evaluating the capacity of GF-5 spectral reflectance for estimating LAI and the performances of different combination of FS and ML algorithms. Firstly, the PROSAIL model, which coupled leaf optical properties model PROSPECT and the scattering by arbitrarily inclined leaves (SAIL) model, was used to generate simulated GF-5 reflectance data under different vegetation and soil conditions, and then three FS methods, including random forest (RF), K-means clustering (K-means) and mean impact value (MIV), and three ML algorithms, including random forest regression (RFR), back propagation neural network (BPNN) and K-nearest neighbor (KNN) were used to develop nine LAI estimation models. The FS process was conducted twice using different strategies: Firstly, three FS methods were conducted to search the lowest dimension number, which maintained the estimation accuracy of all bands. Then, the sequential backward selection (SBS) method was used to eliminate the bands having minimal impact on LAI estimation accuracy. Finally, three best estimation models were selected and evaluated using reference LAI. The results showed that although the RF_RFR model (RF used for feature selection and RFR used for regression) achieved reliable LAI estimates (coefficient of determination (R2) = 0.828, root mean square error (RMSE) = 0.839), the poor performance (R2 = 0.763, RMSE = 0.987) of the MIV_BPNN model (MIV used for feature selection and BPNN used for regression) suggested using feature selection and regression conducted by the same ML algorithm could not always ensure an optimal estimation. Moreover, RF selection preserved the most informative bands for LAI estimation so that each ML regression method could achieve satisfactory estimation results. Finally, the results indicated that the RF_KNN model (RF used as feature selection and KNN used for regression) with seven GF-5 spectral band reflectance achieved the better estimation results than others when validated by simulated data (R2 = 0.834, RMSE = 0.824) and actual reference LAI (R2 = 0.659, RMSE = 0.697).

Download Full-text

Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass

Forests ◽

10.3390/f12020216 ◽

2021 ◽

Vol 12 (2) ◽

pp. 216

Author(s):

Mi Luo ◽

Yifu Wang ◽

Yunhong Xie ◽

Lai Zhou ◽

Jingjing Qiao ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forests ◽

Aboveground Biomass ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Machine Learning Algorithm ◽

Selection Methods ◽

Estimation Models

Increasing numbers of explanatory variables tend to result in information redundancy and “dimensional disaster” in the quantitative remote sensing of forest aboveground biomass (AGB). Feature selection of model factors is an effective method for improving the accuracy of AGB estimates. Machine learning algorithms are also widely used in AGB estimation, although little research has addressed the use of the categorical boosting algorithm (CatBoost) for AGB estimation. Both feature selection and regression for AGB estimation models are typically performed with the same machine learning algorithm, but there is no evidence to suggest that this is the best method. Therefore, the present study focuses on evaluating the performance of the CatBoost algorithm for AGB estimation and comparing the performance of different combinations of feature selection methods and machine learning algorithms. AGB estimation models of four forest types were developed based on Landsat OLI data using three feature selection methods (recursive feature elimination (RFE), variable selection using random forests (VSURF), and least absolute shrinkage and selection operator (LASSO)) and three machine learning algorithms (random forest regression (RFR), extreme gradient boosting (XGBoost), and categorical boosting (CatBoost)). Feature selection had a significant influence on AGB estimation. RFE preserved the most informative features for AGB estimation and was superior to VSURF and LASSO. In addition, CatBoost improved the accuracy of the AGB estimation models compared with RFR and XGBoost. AGB estimation models using RFE for feature selection and CatBoost as the regression algorithm achieved the highest accuracy, with root mean square errors (RMSEs) of 26.54 Mg/ha for coniferous forest, 24.67 Mg/ha for broad-leaved forest, 22.62 Mg/ha for mixed forests, and 25.77 Mg/ha for all forests. The combination of RFE and CatBoost had better performance than the VSURF–RFR combination in which random forests were used for both feature selection and regression, indicating that feature selection and regression performed by a single machine learning algorithm may not always ensure optimal AGB estimation. It is promising to extending the application of new machine learning algorithms and feature selection methods to improve the accuracy of AGB estimates.

Download Full-text

Estimation of Hurricane Intensity from ATMS-Derived Temperature Anomaly using Machine Learning

Global Journal of Science Frontier Research ◽

10.34257/gjsfrhvol20is4pg5 ◽

2020 ◽

pp. 5-15

Author(s):

Lin Lin

Keyword(s):

Machine Learning ◽

Core Structure ◽

Support Vector ◽

Hurricane Intensity ◽

Sea Level Pressure ◽

Warm Core ◽

Svm Model ◽

The Mean ◽

Basic Characteristics ◽

Estimation Models

The warm-core structure is one of the basic characteristics that vary during the different stages of tropical cyclones (TCs). The warm core structure of the TCs during2016-2019 over the Atlantic Ocean was derived based on the observations of the ATMS onboard S-NPP. From linear regression, the mean prediction error (MPE) is 39.04 mph for Vmax and 14.47 hPa for Pmin. The root-mean-square error(RMSE) is 42.70 mph for the maximum sustained wind (Vmax) and 77.69 hPa for the minimum sea-level pressure (Pmin). Several machine learning (ML) techniques are used to develop the Atlantic TC intensity (Vmax and Pmin) estimation models. The support vector machine (SVM) model has the best performance with the MPE of 14.62 mph for Vmaxan 7.66 hPa for Pmin, and the RMSE of 19.91 mph for Vmax and 10.58 hPa for Pmin. Adding latitude and day of year (DOY) can further improve the estimation of Vmax by decreasing MPE to 13.01mph and RME to 17.33 mph using SVM. Best estimation of Pminoccurs when adding the day of year to the training process, as the MPE is 7.23 hPa and RMS is 9.88 hPa. Other TC information, such as longitude and local time, does not help to improve the performance of the hurricane intensity estimation models significantly.

Download Full-text

A Non-Invasive Continuous Blood Pressure Estimation Approach Based on Machine Learning

Sensors ◽

10.3390/s19112585 ◽

2019 ◽

Vol 19 (11) ◽

pp. 2585 ◽

Cited By ~ 11

Author(s):

Shuo Chen ◽

Zhong Ji ◽

Haiyan Wu ◽

Yingchao Xu

Keyword(s):

Machine Learning ◽

Blood Pressure ◽

Measurement Techniques ◽

Pulse Transit Time ◽

Estimation Errors ◽

British Hypertension Society ◽

Non Invasive ◽

Blood Pressure Estimation ◽

The Impact ◽

Estimation Models

Considering the existing issues of traditional blood pressure (BP) measurement methods and non-invasive continuous BP measurement techniques, this study aims to establish the systolic BP and diastolic BP estimation models based on machine learning using pulse transit time and characteristics of pulse waveform. In the process of model construction, the mean impact value method was introduced to investigate the impact of each feature on the models and the genetic algorithm was introduced to implement parameter optimization. The experimental results showed that the proposed models could effectively describe the nonlinear relationship between the features and BP and had higher accuracy than the traditional methods with the error of 3.27 ± 5.52 mmHg for systolic BP and 1.16 ± 1.97 mmHg for diastolic BP. Moreover, the estimation errors met the requirements of the Advancement of Medical Instrumentation and British Hypertension Society criteria. In conclusion, this study was helpful in promoting the practical application of methods for non-invasive continuous BP estimation models.

Download Full-text

Next generation pure component property estimation models: With and without machine learning techniques

AIChE Journal ◽

10.1002/aic.17469 ◽

2021 ◽

Author(s):

Abdulelah S. Alshehri ◽

Anjan K. Tula ◽

Fengqi You ◽

Rafiqul Gani

Keyword(s):

Machine Learning ◽

Pure Component ◽

Machine Learning Techniques ◽

Next Generation ◽

Property Estimation ◽

Learning Techniques ◽

Component Property ◽

Estimation Models

Download Full-text

Vehicle Delay Estimation at Signalized Intersections Using Machine Learning Algorithms

Transportation Research Record Journal of the Transportation Research Board ◽

10.1177/03611981211036874 ◽

2021 ◽

pp. 036119812110368

Author(s):

Muhammed Emin Cihangir Bagdatli ◽

Ahmet Sakir Dokuz

Keyword(s):

Machine Learning ◽

Accurate Determination ◽

Signalized Intersections ◽

Machine Learning Algorithms ◽

Delay Estimation ◽

Gradient Boosting ◽

Support Vector ◽

Extreme Gradient Boosting ◽

Vehicle Delay ◽

Estimation Models

Accurate determination of average vehicle delays is significant for effective management of a signalized intersection. The vehicle delays can be determined by field studies, however, this approach is costly and time consuming. Analytical methods which are commonly utilized to estimate delay cannot generate accurate estimates, especially in oversaturated traffic flow conditions. Delay estimation models based on artificial intelligence have been presented in the literature in recent years to estimate the delay more accurately. However, the number of artificial/heuristic intelligence techniques utilized for vehicle delay estimation is limited in the literature. In this study, estimation models are developed using four different machine learning methods—support vector regression (SVR), random forest (RF), k nearest neighbor (kNN), and extreme gradient boosting (XGBoost)—that have not previously been applied in the literature for vehicle delay estimation at signalized intersections. The models were tested with data collected from 12 signalized intersections located in Ankara, the capital of Turkey, and the performance of the models was revealed. The models were furthermore compared with successful delay models from the literature. The developed models, in particular the RF and XGBoost models, showed high performance in estimating the delay at signalized intersections under different traffic conditions. The results indicate that the delay estimation models based on the RF and XGBoost techniques can significantly contribute to both the literature and practice.

Download Full-text