Comparative Analysis of Machine Learning Algorithms in Automatic Identification and Extraction of Water Boundaries

Monitoring open water bodies accurately is important for assessing the role of ecosystem services in the context of human survival and climate change. There are many methods available for water body extraction based on remote sensing images, such as the normalized difference water index (NDWI), modified NDWI (MNDWI), and machine learning algorithms. Based on Landsat-8 remote sensing images, this study focuses on the effects of six machine learning algorithms and three threshold methods used to extract water bodies, evaluates the transfer performance of models applied to remote sensing images in different periods, and compares the differences among these models. The results are as follows. (1) Various algorithms require different numbers of samples to reach their optimal consequence. The logistic regression algorithm requires a minimum of 110 samples. As the number of samples increases, the order of the optimal model is support vector machine, neural network, random forest, decision tree, and XGBoost. (2) The accuracy evaluation performance of each machine learning on the test set cannot represent the local area performance. (3) When these models are directly applied to remote sensing images in different periods, the AUC indicators of each machine learning algorithm for three regions all show a significant decline, with a decrease range of 0.33–66.52%, and the differences among the different algorithm performances in the three areas are obvious. Generally, the decision tree algorithm has good transfer performance among the machine learning algorithms with area under curve (AUC) indexes of 0.790, 0.518, and 0.697 in the three areas, respectively, and the average value is 0.668. The Otsu threshold algorithm is the optimal among threshold methods, with AUC indexes of 0.970, 0.617, and 0.908 in the three regions respectively and an average AUC of 0.832.

Download Full-text

Mapping Allochemical Limestone Formations in Hazara, Pakistan Using Google Cloud Architecture: Application of Machine-Learning Algorithms on Multispectral Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi10020058 ◽

2021 ◽

Vol 10 (2) ◽

pp. 58

Author(s):

Muhammad Fawad Akbar Khan ◽

Khan Muhammad ◽

Shahid Bashir ◽

Shahab Ud Din ◽

Muhammad Hanif

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Kappa Coefficient ◽

Machine Learning Algorithms ◽

Landsat 8 ◽

Sensing Data ◽

Fossiliferous Limestone

Low-resolution Geological Survey of Pakistan (GSP) maps surrounding the region of interest show oolitic and fossiliferous limestone occurrences correspondingly in Samanasuk, Lockhart, and Margalla hill formations in the Hazara division, Pakistan. Machine-learning algorithms (MLAs) have been rarely applied to multispectral remote sensing data for differentiating between limestone formations formed due to different depositional environments, such as oolitic or fossiliferous. Unlike the previous studies that mostly report lithological classification of rock types having different chemical compositions by the MLAs, this paper aimed to investigate MLAs’ potential for mapping subclasses within the same lithology, i.e., limestone. Additionally, selecting appropriate data labels, training algorithms, hyperparameters, and remote sensing data sources were also investigated while applying these MLAs. In this paper, first, oolitic (Samanasuk), fossiliferous (Lockhart and Margalla) limestone-bearing formations along with the adjoining Hazara formation were mapped using random forest (RF), support vector machine (SVM), classification and regression tree (CART), and naïve Bayes (NB) MLAs. The RF algorithm reported the best accuracy of 83.28% and a Kappa coefficient of 0.78. To further improve the targeted allochemical limestone formation map, annotation labels were generated by the fusion of maps obtained from principal component analysis (PCA), decorrelation stretching (DS), X-means clustering applied to ASTER-L1T, Landsat-8, and Sentinel-2 datasets. These labels were used to train and validate SVM, CART, NB, and RF MLAs to obtain a binary classification map of limestone occurrences in the Hazara division, Pakistan using the Google Earth Engine (GEE) platform. The classification of Landsat-8 data by CART reported 99.63% accuracy, with a Kappa coefficient of 0.99, and was in good agreement with the field validation. This binary limestone map was further classified into oolitic (Samanasuk) and fossiliferous (Lockhart and Margalla) formations by all the four MLAs; in this case, RF surpassed all the other algorithms with an improved accuracy of 96.36%. This improvement can be attributed to better annotation, resulting in a binary limestone classification map, which formed a mask for improved classification of oolitic and fossiliferous limestone in the area.

Download Full-text

Influence of Variable Selection and Forest Type on Forest Aboveground Biomass Estimation Using Machine Learning Algorithms

Forests ◽

10.3390/f10121073 ◽

2019 ◽

Vol 10 (12) ◽

pp. 1073 ◽

Cited By ~ 10

Author(s):

Li ◽

Liu

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Variable Selection ◽

Aboveground Biomass ◽

Forest Type ◽

Learning Algorithms ◽

Forest Biomass ◽

Machine Learning Algorithms ◽

Biomass Estimation ◽

Landsat 8

Forest biomass is a major store of carbon and plays a crucial role in the regional and global carbon cycle. Accurate forest biomass assessment is important for monitoring and mapping the status of and changes in forests. However, while remote sensing-based forest biomass estimation in general is well developed and extensively used, improving the accuracy of biomass estimation remains challenging. In this paper, we used China’s National Forest Continuous Inventory data and Landsat 8 Operational Land Imager data in combination with three algorithms, either the linear regression (LR), random forest (RF), or extreme gradient boosting (XGBoost), to establish biomass estimation models based on forest type. In the modeling process, two methods of variable selection, e.g., stepwise regression and variable importance-base method, were used to select optimal variable subsets for LR and machine learning algorithms (e.g., RF and XGBoost), respectively. Comfortingly, the accuracy of models was significantly improved, and thus the following conclusions were drawn: (1) Variable selection is very important for improving the performance of models, especially for machine learning algorithms, and the influence of variable selection on XGBoost is significantly greater than that of RF. (2) Machine learning algorithms have advantages in aboveground biomass (AGB) estimation, and the XGBoost and RF models significantly improved the estimation accuracy compared with the LR models. Despite that the problems of overestimation and underestimation were not fully eliminated, the XGBoost algorithm worked well and reduced these problems to a certain extent. (3) The approach of AGB modeling based on forest type is a very advantageous method for improving the performance at the lower and higher values of AGB. Some conclusions in this paper were probably different as the study area changed. The methods used in this paper provide an optional and useful approach for improving the accuracy of AGB estimation based on remote sensing data, and the estimation of AGB was a reference basis for monitoring the forest ecosystem of the study area.

Download Full-text

Encrypted DNP3 Traffic Classification Using Supervised Machine Learning Algorithms

Machine Learning and Knowledge Extraction ◽

10.3390/make1010022 ◽

2019 ◽

Vol 1 (1) ◽

pp. 384-399 ◽

Cited By ~ 2

Author(s):

Thais de Toledo ◽

Nunzio Torrisi

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Smart Grids ◽

Learning Algorithms ◽

Electric Utility ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Communication Link

The Distributed Network Protocol (DNP3) is predominately used by the electric utility industry and, consequently, in smart grids. The Peekaboo attack was created to compromise DNP3 traffic, in which a man-in-the-middle on a communication link can capture and drop selected encrypted DNP3 messages by using support vector machine learning algorithms. The communication networks of smart grids are a important part of their infrastructure, so it is of critical importance to keep this communication secure and reliable. The main contribution of this paper is to compare the use of machine learning techniques to classify messages of the same protocol exchanged in encrypted tunnels. The study considers four simulated cases of encrypted DNP3 traffic scenarios and four different supervised machine learning algorithms: Decision tree, nearest-neighbor, support vector machine, and naive Bayes. The results obtained show that it is possible to extend a Peekaboo attack over multiple substations, using a decision tree learning algorithm, and to gather significant information from a system that communicates using encrypted DNP3 traffic.

Download Full-text

A COMPARISON OF MACHINE-LEARNING REGRESSION ALGORITHMS FOR THE ESTIMATION OF LAI USING LANDSAT - 8 SATELLITE DATA

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w16-679-2019 ◽

2019 ◽

Vol XLII-4/W16 ◽

pp. 679-683

Author(s):

V. P. Yadav ◽

R. Prasad ◽

R. Bala ◽

A. K. Vishwakarma ◽

S. A. Yadav ◽

...

Keyword(s):

Machine Learning ◽

Satellite Data ◽

Vegetation Index ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Accurate Estimation ◽

Support Vector ◽

Landsat 8 ◽

Area Index ◽

Global Circulation Models

Abstract. The leaf area index (LAI) is one of key variable of crops which plays important role in agriculture, ecology and climate change for global circulation models to compute energy and water fluxes. In the recent research era, the machine-learning algorithms have provided accurate computational approaches for the estimation of crops biophysical parameters using remotely sensed data. The three machine-learning algorithms, random forest regression (RFR), support vector regression (SVR) and artificial neural network regression (ANNR) were used to estimate the LAI for crops in the present study. The three different dates of Landsat-8 satellite images were used during January 2017 – March 2017 at different crops growth conditions in Varanasi district, India. The sampling regions were fully covered by major Rabi season crops like wheat, barley and mustard etc. In total pooled data, 60% samples were taken for the training of the algorithms and rest 40% samples were taken as testing and validation of the machinelearning regressions algorithms. The highest sensitivity of normalized difference vegetation index (NDVI) with LAI was found using RFR algorithms (R2 = 0.884, RMSE = 0.404) as compared to SVR (R2 = 0.847, RMSE = 0.478) and ANNR (R2 = 0.829, RMSE = 0.404). Therefore, RFR algorithms can be used for accurate estimation of LAI for crops using satellite data.

Download Full-text

Predictive model construction for prediction of soil fertility using decision tree machine learning algorithm

Kongunadu Research Journal ◽

10.26524/krj.2021.5 ◽

2021 ◽

Vol 8 (1) ◽

pp. 30-35

Author(s):

Jayalakshmi R ◽

Savitha Devi M

Keyword(s):

Machine Learning ◽

Decision Tree ◽

Soil Fertility ◽

Learning Algorithms ◽

Crop Productivity ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Support Vector ◽

Severe Problem ◽

Agriculture Sector

Agriculture sector is recognized as the backbone of the Indian economy that plays a crucial role in the growth of the nation’s economy. It imparts on weather and other environmental aspects. Some of the factors on which agriculture is reliant are Soil, climate, flooding, fertilizers, temperature, precipitation, crops, insecticides, and herb. The soil fertility is dependent on these factors and hence difficult to predict. However, the Agriculture sector in India is facing the severe problem of increasing crop productivity. Farmers lack the essential knowledge of nutrient content of the soil, selection of crop best suited for the soil and they also lack efficient methods for predicting crop well in advance so that appropriate methods have been used to improve crop productivity. This paper presents different Supervised Machine Learning Algorithms such as Decision tree, K-Nearest Neighbor (KNN), Support Vector Machine (SVM) to predict the fertility of soil based on macro-nutrients and micro-nutrients status found in the dataset. Supervised Machine Learning algorithms are applied on the training dataset and are tested with the test dataset, and the implementation of these algorithms is done using R Tool. The performance analysis of these algorithms is done using different evaluation metrics like mean absolute error, cross-validation, and accuracy. Result analysis shows that the Decision tree is produced the best accuracy of 99% with a very less mean square error (MSE) rate.

Download Full-text

Evaluation of Machine Learning Algorithms for Surface Water Extraction in a Landsat 8 Scene of Nepal

Sensors ◽

10.3390/s19122769 ◽

2019 ◽

Vol 19 (12) ◽

pp. 2769 ◽

Cited By ~ 8

Author(s):

Tri Dev Acharya ◽

Anoj Subedi ◽

Dong Ha Lee

Keyword(s):

Machine Learning ◽

Surface Water ◽

Recursive Partitioning ◽

Learning Algorithms ◽

High Elevation ◽

Machine Learning Algorithms ◽

Water Extraction ◽

Support Vector ◽

Landsat 8 ◽

Water Index

With over 6000 rivers and 5358 lakes, surface water is one of the most important resources in Nepal. However, the quantity and quality of Nepal’s rivers and lakes are decreasing due to human activities and climate change. Despite the advancement of remote sensing technology and the availability of open access data and tools, the monitoring and surface water extraction works has not been carried out in Nepal. Single or multiple water index methods have been applied in the extraction of surface water with satisfactory results. Extending our previous study, the authors evaluated six different machine learning algorithms: Naive Bayes (NB), recursive partitioning and regression trees (RPART), neural networks (NNET), support vector machines (SVM), random forest (RF), and gradient boosted machines (GBM) to extract surface water in Nepal. With three secondary bands, slope, NDVI and NDWI, the algorithms were evaluated for performance with the addition of extra information. As a result, all the applied machine learning algorithms, except NB and RPART, showed good performance. RF showed overall accuracy (OA) and kappa coefficient (Kappa) of 1 for the all the multiband data with the reference dataset, followed by GBM, NNET, and SVM in metrics. The performances were better in the hilly regions and flat lands, but not well in the Himalayas with ice, snow and shadows, and the addition of slope and NDWI showed improvement in the results. Adding single secondary bands is better than adding multiple in most algorithms except NNET. From current and previous studies, it is recommended to separate any study area with and without snow or low and high elevation, then apply machine learning algorithms in original Landsat data or with the addition of slopes or NDWI for better performance.

Download Full-text

A new ML-based approach to enhance student engagement in online environment

PLoS ONE ◽

10.1371/journal.pone.0258788 ◽

2021 ◽

Vol 16 (11) ◽

pp. e0258788

Author(s):

Sarra Ayouni ◽

Fahima Hajjej ◽

Mohamed Maddeh ◽

Shaha Al-Otaibi

Keyword(s):

Neural Network ◽

Machine Learning ◽

Artificial Neural Network ◽

Support Vector Machine ◽

Student Engagement ◽

Decision Tree ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Online Environment

The educational research is increasingly emphasizing the potential of student engagement and its impact on performance, retention and persistence. This construct has emerged as an important paradigm in the higher education field for many decades. However, evaluating and predicting the student’s engagement level in an online environment remains a challenge. The purpose of this study is to suggest an intelligent predictive system that predicts the student’s engagement level and then provides the students with feedback to enhance their motivation and dedication. Three categories of students are defined depending on their engagement level (Not Engaged, Passively Engaged, and Actively Engaged). We applied three different machine-learning algorithms, namely Decision Tree, Support Vector Machine and Artificial Neural Network, to students’ activities recorded in Learning Management System reports. The results demonstrate that machine learning algorithms could predict the student’s engagement level. In addition, according to the performance metrics of the different algorithms, the Artificial Neural Network has a greater accuracy rate (85%) compared to the Support Vector Machine (80%) and Decision Tree (75%) classification techniques. Based on these results, the intelligent predictive system sends feedback to the students and alerts the instructor once a student’s engagement level decreases. The instructor can identify the students’ difficulties during the course and motivate them through e-mail reminders, course messages, or scheduling an online meeting.

Download Full-text

Correction: Predicting Health Material Accessibility: Development of Machine Learning Algorithms (Preprint)

10.2196/preprints.33385 ◽

2021 ◽

Author(s):

Meng Ji ◽

Yanmeng Liu ◽

Tianyong Hao

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Decision Tree ◽

Health Information ◽

Learning Algorithms ◽

Ensemble Classifier ◽

Machine Learning Algorithms ◽

Support Vector ◽

Semantic Features ◽

Cognitive Accessibility

BACKGROUND Current health information understandability research uses medical readability formulas to assess the cognitive difficulty of health education resources. This is based on an implicit assumption that medical domain knowledge represented by uncommon words or jargon form the sole barriers to health information access among the public. Our study challenged this by showing that, for readers from non-English speaking backgrounds with higher education attainment, semantic features of English health texts that underpin the knowledge structure of English health texts, rather than medical jargon, can explain the cognitive accessibility of health materials among readers with better understanding of English health terms yet limited exposure to English-based health education environments and traditions. OBJECTIVE Our study explores multidimensional semantic features for developing machine learning algorithms to predict the perceived level of cognitive accessibility of English health materials on health risks and diseases for young adults enrolled in Australian tertiary institutes. We compared algorithms to evaluate the cognitive accessibility of health information for nonnative English speakers with advanced education levels yet limited exposure to English health education environments. METHODS We used 113 semantic features to measure the content complexity and accessibility of original English resources. Using 1000 English health texts collected from Australian and international health organization websites rated by overseas tertiary students, we compared machine learning (decision tree, support vector machine, ensemble classifier, and logistic regression) after hyperparameter optimization (grid search for the best hyperparameter combination of minimal classification errors). We applied 5-fold cross-validation on the whole data set for the model training and testing; and calculated the area under the operating characteristic curve (AUC), sensitivity, specificity, and accuracy as the measurement of the model performance. RESULTS We developed and compared 4 machine learning algorithms using multidimensional semantic features as predictors. The results showed that ensemble classifier (LogitBoost) outperformed in terms of AUC (0.858), sensitivity (0.787), specificity (0.813), and accuracy (0.802). Support vector machine (AUC 0.848, sensitivity 0.783, specificity 0.791, and accuracy 0.786) and decision tree (AUC 0.754, sensitivity 0.7174, specificity 0.7424, and accuracy 0.732) followed. Ensemble classifier (LogitBoost), support vector machine, and decision tree achieved statistically significant improvement over logistic regression in AUC, sensitivity, specificity, and accuracy. Support vector machine reached statistically significant improvement over decision tree in AUC and accuracy. As the best performing algorithm, ensemble classifier (LogitBoost) reached statistically significant improvement over decision tree in AUC, sensitivity, specificity, and accuracy. CONCLUSIONS Our study shows that cognitive accessibility of English health texts is not limited to word length and sentence length as had been conventionally measured by medical readability formulas. We compared machine learning algorithms based on semantic features to explore the cognitive accessibility of health information for nonnative English speakers. The results showed the new models reached statistically increased AUC, sensitivity, and accuracy to predict health resource accessibility for the target readership. Our study illustrated that semantic features such as cognitive ability–related semantic features, communicative actions and processes, power relationships in health care settings, and lexical familiarity and diversity of health texts are large contributors to the comprehension of health information; for readers such as international students, semantic features of health texts outweigh syntax and domain knowledge.

Download Full-text

Machine Learning-Based Slum Mapping in Support of Slum Upgrading Programs: The Case of Bandung City, Indonesia

Remote Sensing ◽

10.3390/rs10101522 ◽

2018 ◽

Vol 10 (10) ◽

pp. 1522 ◽

Cited By ~ 14

Author(s):

Gina Leonita ◽

Monika Kuffer ◽

Richard Sliuzas ◽

Claudio Persello

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Large Scale ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

Slum Upgrading ◽

Sequential Feature Selection ◽

Local Acceptance ◽

Consistent Information

The survey-based slum mapping (SBSM) program conducted by the Indonesian government to reach the national target of “cities without slums” by 2019 shows mapping inconsistencies due to several reasons, e.g., the dependency on the surveyor’s experiences and the complexity of the slum indicators set. By relying on such inconsistent maps, it will be difficult to monitor the national slum upgrading program’s progress. Remote sensing imagery combined with machine learning algorithms could support the reduction of these inconsistencies. This study evaluates the performance of two machine learning algorithms, i.e., support vector machine (SVM) and random forest (RF), for slum mapping in support of the slum mapping campaign in Bandung, Indonesia. Recognizing the complexity in differentiating slum and formal areas in Indonesia, the study used a combination of spectral, contextual, and morphological features. In addition, sequential feature selection (SFS) combined with the Hilbert–Schmidt independence criterion (HSIC) was used to select significant features for classifying slums. Overall, the highest accuracy (88.5%) was achieved by the SVM with SFS using contextual, morphological, and spectral features, which is higher than the estimated accuracy of the SBSM. To evaluate the potential of machine learning-based slum mapping (MLBSM) in support of slum upgrading programs, interviews were conducted with several local and national stakeholders. Results show that local acceptance for a remote sensing-based slum mapping approach varies among stakeholder groups. Therefore, a locally adapted framework is required to combine ground surveys with robust and consistent machine learning methods, for being able to deal with big data, and to allow the rapid extraction of consistent information on the dynamics of slums at a large scale.

Download Full-text

Machine Learning Algorithms for Automatic Lithological Mapping Using Remote Sensing Data: A Case Study from Souk Arbaa Sahel, Sidi Ifni Inlier, Western Anti-Atlas, Morocco

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi8060248 ◽

2019 ◽

Vol 8 (6) ◽

pp. 248 ◽

Cited By ~ 7

Author(s):

Imane Bachri ◽

Mustapha Hakdaoui ◽

Mohammed Raji ◽

Ana Cláudia Teodoro ◽

Abdelmajid Benbouziane

Keyword(s):

Machine Learning ◽

Remote Sensing ◽

Learning Algorithms ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

High Dimensional ◽

Landsat 8 ◽

Landsat 8 Oli ◽

Lithological Mapping ◽

Sensing Data

Remote sensing data proved to be a valuable resource in a variety of earth science applications. Using high-dimensional data with advanced methods such as machine learning algorithms (MLAs), a sub-domain of artificial intelligence, enhances lithological mapping by spectral classification. Support vector machines (SVM) are one of the most popular MLAs with the ability to define non-linear decision boundaries in high-dimensional feature space by solving a quadratic optimization problem. This paper describes a supervised classification method considering SVM for lithological mapping in the region of Souk Arbaa Sahel belonging to the Sidi Ifni inlier, located in southern Morocco (Western Anti-Atlas). The aims of this study were (1) to refine the existing lithological map of this region, and (2) to evaluate and study the performance of the SVM approach by using combined spectral features of Landsat 8 OLI with digital elevation model (DEM) geomorphometric attributes of ALOS/PALSAR data. We performed an SVM classification method to allow the joint use of geomorphometric features and multispectral data of Landsat 8 OLI. The results indicated an overall classification accuracy of 85%. From the results obtained, we can conclude that the classification approach produced an image containing lithological units which easily identified formations such as silt, alluvium, limestone, dolomite, conglomerate, sandstone, rhyolite, andesite, granodiorite, quartzite, lutite, and ignimbrite, coinciding with those already existing on the published geological map. This result confirms the ability of SVM as a supervised learning algorithm for lithological mapping purposes.

Download Full-text