The Predictive Capability of a Novel Ensemble Tree-Based Algorithm for Assessing Groundwater Potential

Understanding the potential groundwater resource distribution is critical for sustainable groundwater development, conservation, and management strategies. This study analyzes and maps the groundwater potential in Busan Metropolitan City, South Korea, using random forest (RF), gradient boosting machine (GBM), and extreme gradient boosting (XGB) methods. Fourteen groundwater conditioning factors were evaluated for their contribution to groundwater potential assessment using an elastic net. Curvature, the stream power index, the distance from drainage, lineament density, and fault density were excluded from the subsequent analysis, while nine other factors were used to create groundwater potential maps (GMPs) using the RF, GBM, and XGB models. The accuracy of the resultant GPMs was tested using receiver operating characteristic curves and the seed cell area index, and the results were compared. The analysis showed that the three models used in this study satisfactorily predicted the spatial distribution of groundwater in the study area. In particular, the XGB model showed the highest prediction accuracy (0.818), followed by the GBM (0.802) and the RF models (0.794). The XGB model, which is the most recently developed technique, was found to best contribute to improving the accuracy of the GPMs. These results contribute to the establishment of a sustainable management plan for groundwater resources in the study area.

Download Full-text

Harris Hawks Optimization: A Novel Swarm Intelligence Technique for Spatial Assessment of Landslide Susceptibility

Sensors ◽

10.3390/s19163590 ◽

2019 ◽

Vol 19 (16) ◽

pp. 3590 ◽

Cited By ~ 38

Author(s):

Bui ◽

Moayedi ◽

Kalantar ◽

Osouli ◽

Gör ◽

...

Keyword(s):

Landslide Susceptibility ◽

Characteristic Curve ◽

Absolute Error ◽

Slope Aspect ◽

Stream Power ◽

Topographic Wetness Index ◽

Conditioning Factors ◽

Performance Error ◽

Stream Power Index ◽

Artificial Neural Network Ann

In this research, the novel metaheuristic algorithm Harris hawks optimization (HHO) is applied to landslide susceptibility analysis in Western Iran. To this end, the HHO is synthesized with an artificial neural network (ANN) to optimize its performance. A spatial database comprising 208 historical landslides, as well as 14 landslide conditioning factors—elevation, slope aspect, plan curvature, profile curvature, soil type, lithology, distance to the river, distance to the road, distance to the fault, land cover, slope degree, stream power index (SPI), topographic wetness index (TWI), and rainfall—is prepared to develop the ANN and HHO–ANN predictive tools. Mean square error and mean absolute error criteria are defined to measure the performance error of the models, and area under the receiving operating characteristic curve (AUROC) is used to evaluate the accuracy of the generated susceptibility maps. The findings showed that the HHO algorithm effectively improved the performance of ANN in both recognizing (AUROCANN = 0.731 and AUROCHHO–ANN = 0.777) and predicting (AUROCANN = 0.720 and AUROCHHO–ANN = 0.773) the landslide pattern.

Download Full-text

Simulating the Leaf Area Index of Rice from Multispectral Images

Remote Sensing ◽

10.3390/rs13183663 ◽

2021 ◽

Vol 13 (18) ◽

pp. 3663

Author(s):

Shenzhou Liu ◽

Wenzhi Zeng ◽

Lifeng Wu ◽

Guoqing Lei ◽

Haorui Chen ◽

...

Keyword(s):

Leaf Area Index ◽

Leaf Area ◽

Vegetation Index ◽

Normalized Difference Vegetation Index ◽

Estimation Accuracy ◽

Gradient Boosting ◽

Accurate Estimation ◽

Area Index ◽

Red Edge ◽

Extreme Gradient Boosting

Accurate estimation of the leaf area index (LAI) is essential for crop growth simulations and agricultural management. This study conducted a field experiment with rice and measured the LAI in different rice growth periods. The multispectral bands (B) including red edge (RE, 730 nm ± 16 nm), near-infrared (NIR, 840 nm ± 26 nm), green (560 nm ± 16 nm), red (650 nm ± 16 nm), blue (450 nm ± 16 nm), and visible light (RGB) were also obtained by an unmanned aerial vehicle (UAV) with multispectral sensors (DJI-P4M, SZ DJI Technology Co., Ltd.). Based on the bands, five vegetation indexes (VI) including Green Normalized Difference Vegetation Index (GNDVI), Leaf Chlorophyll Index (LCI), Normalized Difference Red Edge Index (NDRE), Normalized Difference Vegetation Index (NDVI), and Optimization Soil-Adjusted Vegetation Index (OSAVI) were calculated. The semi-empirical model (SEM), the random forest model (RF), and the Extreme Gradient Boosting model (XGBoost) were used to estimate rice LAI based on multispectral bands, VIs, and their combinations, respectively. The results indicated that the GNDVI had the highest accuracy in the SEM (R2 = 0.78, RMSE = 0.77). For the single band, NIR had the highest accuracy in both RF (R2 = 0.73, RMSE = 0.98) and XGBoost (R2 = 0.77, RMSE = 0.88). Band combination of NIR + red improved the estimation accuracy in both RF (R2 = 0.87, RMSE = 0.65) and XGBoost (R2 = 0.88, RMSE = 0.63). NDRE and LCI were the first two single VIs for LAI estimation using both RF and XGBoost. However, putting more than one VI together could only increase the LAI estimation accuracy slightly. Meanwhile, the bands + VIs combinations could improve the accuracy in both RF and XGBoost. Our study recommended estimating rice LAI by a combination of red + NIR + OSAVI + NDVI + GNDVI + LCI + NDRE (2B + 5V) with XGBoost to obtain high accuracy and overcome the potential over-fitting issue (R2 = 0.91, RMSE = 0.54).

Download Full-text

Assessment of Landslide Susceptibility by Decision Trees in the Metropolitan Area of Istanbul, Turkey

Mathematical Problems in Engineering ◽

10.1155/2010/901095 ◽

2010 ◽

Vol 2010 ◽

pp. 1-15 ◽

Cited By ~ 129

Author(s):

H. A. Nefeslioglu ◽

E. Sezer ◽

C. Gokceoglu ◽

A. S. Bozkir ◽

T. Y. Duman

Keyword(s):

Decision Tree ◽

Metropolitan Area ◽

Landslide Susceptibility ◽

Susceptibility Map ◽

Northern Coast ◽

Stream Power ◽

Landslide Susceptibility Map ◽

Conditioning Factors ◽

Stream Power Index ◽

Predicted Values

The main purpose of the present study is to investigate the possible application of decision tree in landslide susceptibility assessment. The study area having a surface area of 174.8 locates at the northern coast of the Sea of Marmara and western part of Istanbul metropolitan area. When applying data mining and extracting decision tree, geological formations, altitude, slope, plan curvature, profile curvature, heat load and stream power index parameters are taken into consideration as landslide conditioning factors. Using the predicted values, the landslide susceptibility map of the study area is produced. The AUC value of the produced landslide susceptibility map has been obtained as 89.6%. According to the results of the AUC evaluation, the produced map has exhibited a good enough performance.

Download Full-text

Improvement of Credal Decision Trees Using Ensemble Frameworks for Groundwater Potential Modeling

Sustainability ◽

10.3390/su12072622 ◽

2020 ◽

Vol 12 (7) ◽

pp. 2622 ◽

Cited By ~ 8

Author(s):

Phong Tung Nguyen ◽

Duong Hai Ha ◽

Huu Duy Nguyen ◽

Tran Van Phong ◽

Phan Trong Trinh ◽

...

Keyword(s):

Decision Trees ◽

Flow Direction ◽

Groundwater Resources ◽

Groundwater Potential ◽

Hybrid Models ◽

Topographic Wetness Index ◽

Conditioning Factors ◽

Statistical Measures ◽

The World ◽

Potential Mapping

Groundwater is one of the most important sources of fresh water all over the world, especially in those countries where rainfall is erratic, such as Vietnam. Nowadays, machine learning (ML) models are being used for the assessment of groundwater potential of the region. Credal decision trees (CDT) is one of the ML models which has been used in such studies. In the present study, the performance of the CDT has been improved using various ensemble frameworks such as Bagging, Dagging, Decorate, Multiboost, and Random SubSpace. Based on these methods, five hybrid models, namely BCDT, Dagging-CDT, Decorate-CDT, MBCDT, and RSSCDT, were developed and applied for groundwater potential mapping of DakLak province of Vietnam. Data of 227 groundwater wells of the study area were utilized for the construction and validation of the models. Twelve groundwater potential conditioning factors, namely rainfall, slope, elevation, river density, Sediment Transport Index (STI), curvature, flow direction, aspect, soil, land use, Topographic Wetness Index (TWI), and geology, were considered for the model studies. Various statistical measures, including area under receiver operating characteristic (AUC) curve, were applied to validate and compare the performance of the models. The results show that performance of the hybrid CDT ensemble models MBCDT (AUC = 0.770), BCDT (AUC = 0.731), Dagging-CDT (AUC = 0.763), Decorate-CDT (AUC = 0.750), and RSSCDT (AUC = 0.766) improved significantly in comparison to the single CDT (AUC = 0.722) model. Therefore, these developed hybrid models can be applied for better ground water potential mapping and groundwater resources management of the study area as well as other regions of the world.

Download Full-text

Integrated Geomorphological, Geospatial and AHP Technique for Groundwater Prospects Mapping in Basaltic Terrain

10.21523/gcj3.18020102 ◽

2018 ◽

Vol 2 (1) ◽

pp. 16-27 ◽

Cited By ~ 11

Author(s):

Vaishnavi Mundalik ◽

Clinton Fernandes ◽

Ajaykumar Kadam ◽

Bhavana Umrikar

Keyword(s):

Groundwater Resources ◽

Drainage Density ◽

Groundwater Potential ◽

The Sustainable Development ◽

Thematic Layers ◽

Groundwater Availability ◽

Basaltic Terrain ◽

Thematic Layer ◽

Increasing Demand ◽

Gis Environment

Groundwater is an important source of drinking water in rural parts of India. Because of the increasing demand for water, it is essential to identify new sources for the sustainable development of this resource. The potential mapping and exploration of groundwater resources have become a breakthrough in the field of hydrogeological research. In the present paper, a groundwater prospects map is delineated for the assessment of groundwater availability in Kar basin on basaltic terrain, using remote sensing and Geographic Information System (GIS) techniques. Various thematic layers such as geology, slope, soil, geomorphology, drainage density and rainfall are prepared using satellite data, topographic maps and field data. The ranks and weights were assigned to each thematic layer and various categories of those thematic layers using AHP technique respectively. Further, a weighted overlay analysis was performed by reclassifying them in the GIS environment to prepare the groundwater potential map of the study area. The results show that groundwater prospects map classified into three classes low, moderate and high having area 17.12%, 38.26%, 44.62%, respectively. The overlay map with the groundwater potential zones in the study area has been found to be helpful for better planning and managing the resources.

Download Full-text

Predicting Undesired Treatment Outcome in Mental Healthcare: Machine Learning Study (Preprint)

10.2196/preprints.17235 ◽

2019 ◽

Author(s):

Kasper Van Mens ◽

Joran Lokkerbol ◽

Richard Janssen ◽

Robert de Lange ◽

Bea Tiemens

Keyword(s):

Machine Learning ◽

Treatment Outcome ◽

Mental Health Treatment ◽

Mental Healthcare ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

Trade Off ◽

Trade Offs ◽

Outcome Monitoring ◽

Extreme Gradient Boosting

BACKGROUND It remains a challenge to predict which treatment will work for which patient in mental healthcare. OBJECTIVE In this study we compare machine algorithms to predict during treatment which patients will not benefit from brief mental health treatment and present trade-offs that must be considered before an algorithm can be used in clinical practice. METHODS Using an anonymized dataset containing routine outcome monitoring data from a mental healthcare organization in the Netherlands (n = 2,655), we applied three machine learning algorithms to predict treatment outcome. The algorithms were internally validated with cross-validation on a training sample (n = 1,860) and externally validated on an unseen test sample (n = 795). RESULTS The performance of the three algorithms did not significantly differ on the test set. With a default classification cut-off at 0.5 predicted probability, the extreme gradient boosting algorithm showed the highest positive predictive value (ppv) of 0.71(0.61 – 0.77) with a sensitivity of 0.35 (0.29 – 0.41) and area under the curve of 0.78. A trade-off can be made between ppv and sensitivity by choosing different cut-off probabilities. With a cut-off at 0.63, the ppv increased to 0.87 and the sensitivity dropped to 0.17. With a cut-off of at 0.38, the ppv decreased to 0.61 and the sensitivity increased to 0.57. CONCLUSIONS Machine learning can be used to predict treatment outcomes based on routine monitoring data.This allows practitioners to choose their own trade-off between being selective and more certain versus inclusive and less certain.

Download Full-text

XGBoost and Network Analysis for Prediction of Proteins Affecting Insulin based on Protein Protein Interactions

Kinetik Game Technology Information System Computer Network Computing Electronics and Control ◽

10.22219/kinetik.v5i4.1076 ◽

2020 ◽

pp. 253-262

Author(s):

Mohammad Hamim Zajuli Al Faroby ◽

Mohammad Isa Irawan ◽

Ni Nyoman Tri Puspaningsih

Keyword(s):

Protein Interactions ◽

Interaction Analysis ◽

Synthesis Process ◽

Gradient Boosting ◽

Protein Protein Interactions ◽

Central Function ◽

Extreme Gradient Boosting ◽

Main Protein ◽

The Right ◽

Roc Score

Protein Interaction Analysis (PPI) can be used to identify proteins that have a supporting function on the main protein, especially in the synthesis process. Insulin is synthesized by proteins that have the same molecular function covering different but mutually supportive roles. To identify this function, the translation of Gene Ontology (GO) gives certain characteristics to each protein. This study purpose to predict proteins that interact with insulin using the centrality method as a feature extractor and extreme gradient boosting as a classification algorithm. Characteristics using the centralized method produces features as a central function of protein. Classification results are measured using measurements, precision, recall and ROC scores. Optimizing the model by finding the right parameters produces an accuracy of and a ROC score of . The prediction model produced by XGBoost has capabilities above the average of other machine learning methods.

Download Full-text

Evaluation of Three Different Machine Learning Methods for Object-Based Artificial Terrace Mapping—A Case Study of the Loess Plateau, China

Remote Sensing ◽

10.3390/rs13051021 ◽

2021 ◽

Vol 13 (5) ◽

pp. 1021

Author(s):

Hu Ding ◽

Jiaming Na ◽

Shangjing Jiang ◽

Jie Zhu ◽

Kai Liu ◽

...

Keyword(s):

Machine Learning ◽

Random Forest ◽

Loess Plateau ◽

Water Conservation ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

The Loess Plateau ◽

Object Based ◽

Extreme Gradient Boosting

Artificial terraces are of great importance for agricultural production and soil and water conservation. Automatic high-accuracy mapping of artificial terraces is the basis of monitoring and related studies. Previous research achieved artificial terrace mapping based on high-resolution digital elevation models (DEMs) or imagery. As a result of the importance of the contextual information for terrace mapping, object-based image analysis (OBIA) combined with machine learning (ML) technologies are widely used. However, the selection of an appropriate classifier is of great importance for the terrace mapping task. In this study, the performance of an integrated framework using OBIA and ML for terrace mapping was tested. A catchment, Zhifanggou, in the Loess Plateau, China, was used as the study area. First, optimized image segmentation was conducted. Then, features from the DEMs and imagery were extracted, and the correlations between the features were analyzed and ranked for classification. Finally, three different commonly-used ML classifiers, namely, extreme gradient boosting (XGBoost), random forest (RF), and k-nearest neighbor (KNN), were used for terrace mapping. The comparison with the ground truth, as delineated by field survey, indicated that random forest performed best, with a 95.60% overall accuracy (followed by 94.16% and 92.33% for XGBoost and KNN, respectively). The influence of class imbalance and feature selection is discussed. This work provides a credible framework for mapping artificial terraces.

Download Full-text

Shallow urban aquifers under hyper-recharge equatorial conditions and strong anthropogenic constrains. Implications in terms of groundwater resources potential and integrated water resources management strategies

The Science of The Total Environment ◽

10.1016/j.scitotenv.2020.143887 ◽

2021 ◽

Vol 757 ◽

pp. 143887

Author(s):

B. Nlend ◽

H. Celle-Jeanton ◽

F. Huneau ◽

E. Garel ◽

S. Ngo Boum-Nkot ◽

...

Keyword(s):

Water Resources ◽

Water Resources Management ◽

Groundwater Resources ◽

Management Strategies ◽

Integrated Water Resources Management ◽

Resources Management

Download Full-text

Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18126429 ◽

2021 ◽

Vol 18 (12) ◽

pp. 6429

Author(s):

Irfan Ullah Khan ◽

Nida Aslam ◽

Malak Aljabri ◽

Sumayh S. Aljameel ◽

Mariam Moataz Aly Kamaleldin ◽

...

Keyword(s):

Mortality Rate ◽

Computational Intelligence ◽

Nearest Neighbor ◽

Gradient Boosting ◽

K Nearest Neighbor ◽

Detection And Identification ◽

Proposed Model ◽

Extreme Gradient Boosting ◽

The World ◽

Detection And Diagnosis

The COVID-19 outbreak is currently one of the biggest challenges facing countries around the world. Millions of people have lost their lives due to COVID-19. Therefore, the accurate early detection and identification of severe COVID-19 cases can reduce the mortality rate and the likelihood of further complications. Machine Learning (ML) and Deep Learning (DL) models have been shown to be effective in the detection and diagnosis of several diseases, including COVID-19. This study used ML algorithms, such as Decision Tree (DT), Logistic Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), and K-Nearest Neighbor (KNN) and DL model (containing six layers with ReLU and output layer with sigmoid activation), to predict the mortality rate in COVID-19 cases. Models were trained using confirmed COVID-19 patients from 146 countries. Comparative analysis was performed among ML and DL models using a reduced feature set. The best results were achieved using the proposed DL model, with an accuracy of 0.97. Experimental results reveal the significance of the proposed model over the baseline study in the literature with the reduced feature set.

Download Full-text