scholarly journals Active-Learning Approaches for Landslide Mapping Using Support Vector Machines

2021 ◽  
Vol 13 (13) ◽  
pp. 2588
Author(s):  
Zhihao Wang ◽  
Alexander Brenning

Ex post landslide mapping for emergency response and ex ante landslide susceptibility modelling for hazard mitigation are two important application scenarios that require the development of accurate, yet cost-effective spatial landslide models. However, the manual labelling of instances for training machine learning models is time-consuming given the data requirements of flexible data-driven algorithms and the small percentage of area covered by landslides. Active learning aims to reduce labelling costs by selecting more informative instances. In this study, two common active-learning strategies, uncertainty sampling and query by committee, are combined with the support vector machine (SVM), a state-of-the-art machine-learning technique, in a landslide mapping case study in order to assess their possible benefits compared to simple random sampling of training locations. By selecting more “informative” instances, the SVMs with active learning based on uncertainty sampling outperformed both random sampling and query-by-committee strategies when considering mean AUROC (area under the receiver operating characteristic curve) as performance measure. Uncertainty sampling also produced more stable performances with a smaller AUROC standard deviation across repetitions. In conclusion, under limited data conditions, uncertainty sampling reduces the amount of expert time needed by selecting more informative instances for SVM training. We therefore recommend incorporating active learning with uncertainty sampling into interactive landslide modelling workflows, especially in emergency response settings, but also in landslide susceptibility modelling.

2021 ◽  
Author(s):  
Vu-Linh Nguyen ◽  
Mohammad Hossein Shaker ◽  
Eyke Hüllermeier

AbstractVarious strategies for active learning have been proposed in the machine learning literature. In uncertainty sampling, which is among the most popular approaches, the active learner sequentially queries the label of those instances for which its current prediction is maximally uncertain. The predictions as well as the measures used to quantify the degree of uncertainty, such as entropy, are traditionally of a probabilistic nature. Yet, alternative approaches to capturing uncertainty in machine learning, alongside with corresponding uncertainty measures, have been proposed in recent years. In particular, some of these measures seek to distinguish different sources and to separate different types of uncertainty, such as the reducible (epistemic) and the irreducible (aleatoric) part of the total uncertainty in a prediction. The goal of this paper is to elaborate on the usefulness of such measures for uncertainty sampling, and to compare their performance in active learning. To this end, we instantiate uncertainty sampling with different measures, analyze the properties of the sampling strategies thus obtained, and compare them in an experimental study.


2021 ◽  
Vol 10 (2) ◽  
pp. 93
Author(s):  
Wei Xie ◽  
Xiaoshuang Li ◽  
Wenbin Jian ◽  
Yang Yang ◽  
Hongwei Liu ◽  
...  

Landslide susceptibility mapping (LSM) could be an effective way to prevent landslide hazards and mitigate losses. The choice of conditional factors is crucial to the results of LSM, and the selection of models also plays an important role. In this study, a hybrid method including GeoDetector and machine learning cluster was developed to provide a new perspective on how to address these two issues. We defined redundant factors by quantitatively analyzing the single impact and interactive impact of the factors, which was analyzed by GeoDetector, the effect of this step was examined using mean absolute error (MAE). The machine learning cluster contains four models (artificial neural network (ANN), Bayesian network (BN), logistic regression (LR), and support vector machines (SVM)) and automatically selects the best one for generating LSM. The receiver operating characteristic (ROC) curve, prediction accuracy, and the seed cell area index (SCAI) methods were used to evaluate these methods. The results show that the SVM model had the best performance in the machine learning cluster with the area under the ROC curve of 0.928 and with an accuracy of 83.86%. Therefore, SVM was chosen as the assessment model to map the landslide susceptibility of the study area. The landslide susceptibility map demonstrated fit with landslide inventory, indicated the hybrid method is effective in screening landslide influences and assessing landslide susceptibility.


2021 ◽  
Author(s):  
Rui Liu ◽  
Xin Yang ◽  
Chong Xu ◽  
Luyao Li ◽  
Xiangqiang Zeng

Abstract Landslide susceptibility mapping (LSM) is a useful tool to estimate the probability of landslide occurrence, providing a scientific basis for natural hazards prevention, land use planning, and economic development in landslide-prone areas. To date, a large number of machine learning methods have been applied to LSM, and recently the advanced Convolutional Neural Network (CNN) has been gradually adopted to enhance the prediction accuracy of LSM. The objective of this study is to introduce a CNN based model in LSM and systematically compare its overall performance with the conventional machine learning models of random forest, logistic regression, and support vector machine. Herein, we selected the Jiuzhaigou region in Sichuan Province, China as the study area. A total number of 710 landslides and 12 predisposing factors were stacked to form spatial datasets for LSM. The ROC analysis and several statistical metrics, such as accuracy, root mean square error (RMSE), Kappa coefficient, sensitivity, and specificity were used to evaluate the performance of the models in the training and validation datasets. Finally, the trained models were calculated and the landslide susceptibility zones were mapped. Results suggest that both CNN and conventional machine-learning based models have a satisfactory performance (AUC: 85.72% − 90.17%). The CNN based model exhibits excellent good-of-fit and prediction capability, and achieves the highest performance (AUC: 90.17%) but also significantly reduces the salt-of-pepper effect, which indicates its great potential of application to LSM.


Author(s):  
Sarmad Mahar ◽  
Sahar Zafar ◽  
Kamran Nishat

Headnotes are the precise explanation and summary of legal points in an issued judgment. Law journals hire experienced lawyers to write these headnotes. These headnotes help the reader quickly determine the issue discussed in the case. Headnotes comprise two parts. The first part comprises the topic discussed in the judgment, and the second part contains a summary of that judgment. In this thesis, we design, develop and evaluate headnote prediction using machine learning, without involving human involvement. We divided this task into a two steps process. In the first step, we predict law points used in the judgment by using text classification algorithms. The second step generates a summary of the judgment using text summarization techniques. To achieve this task, we created a Databank by extracting data from different law sources in Pakistan. We labelled training data generated based on Pakistan law websites. We tested different feature extraction methods on judiciary data to improve our system. Using these feature extraction methods, we developed a dictionary of terminology for ease of reference and utility. Our approach achieves 65% accuracy by using Linear Support Vector Classification with tri-gram and without stemmer. Using active learning our system can continuously improve the accuracy with the increased labelled examples provided by the users of the system.


2020 ◽  
Vol 198 ◽  
pp. 03023
Author(s):  
Xin Yang ◽  
Rui Liu ◽  
Luyao Li ◽  
Mei Yang ◽  
Yuantao Yang

Landslide susceptibility mapping is a method used to assess the probability and spatial distribution of landslide occurrences. Machine learning methods have been widely used in landslide susceptibility in recent years. In this paper, six popular machine learning algorithms namely logistic regression, multi-layer perceptron, random forests, support vector machine, Adaboost, and gradient boosted decision tree were leveraged to construct landslide susceptibility models with a total of 1365 landslide points and 14 predisposing factors. Subsequently, the landslide susceptibility maps (LSM) were generated by the trained models. LSM shows the main landslide zone is concentrated in the southeastern area of Wenchuan County. The result of ROC curve analysis shows that all models fitted the training datasets and achieved satisfactory results on validation datasets. The results of this paper reveal that machine learning methods are feasible to build robust landslide susceptibility models.


2020 ◽  
Vol 10 (11) ◽  
pp. 4016 ◽  
Author(s):  
Xudong Hu ◽  
Han Zhang ◽  
Hongbo Mei ◽  
Dunhui Xiao ◽  
Yuanyuan Li ◽  
...  

Landslide susceptibility mapping is considered to be a prerequisite for landslide prevention and mitigation. However, delineating the spatial occurrence pattern of the landslide remains a challenge. This study investigates the potential application of the stacking ensemble learning technique for landslide susceptibility assessment. In particular, support vector machine (SVM), artificial neural network (ANN), logical regression (LR), and naive Bayes (NB) were selected as base learners for the stacking ensemble method. The resampling scheme and Pearson’s correlation analysis were jointly used to evaluate the importance level of these base learners. A total of 388 landslides and 12 conditioning factors in the Lushui area (Southwest China) were used as the dataset to develop landslide modeling. The landslides were randomly separated into two parts, with 70% used for model training and 30% used for model validation. The models’ performance was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC) and statistical measures. The results showed that the stacking-based ensemble model achieved an improved predictive accuracy as compared to the single algorithms, while the SVM-ANN-NB-LR (SANL) model, the SVM-ANN-NB (SAN) model, and the ANN-NB-LR (ANL) models performed equally well, with AUC values of 0.931, 0.940, and 0.932, respectively, for validation stage. The correlation coefficient between the LR and SVM was the highest for all resampling rounds, with a value of 0.72 on average. This connotes that LR and SVM played an almost equal role when the ensemble of SANL was applied for landslide susceptibility analysis. Therefore, it is feasible to use the SAN model or the ANL model for the study area. The finding from this study suggests that the stacking ensemble machine learning method is promising for landslide susceptibility mapping in the Lushui area and is capable of targeting areas prone to landslides.


2020 ◽  
Vol 28 (4) ◽  
pp. 532-551
Author(s):  
Blake Miller ◽  
Fridolin Linder ◽  
Walter R. Mebane

Supervised machine learning methods are increasingly employed in political science. Such models require costly manual labeling of documents. In this paper, we introduce active learning, a framework in which data to be labeled by human coders are not chosen at random but rather targeted in such a way that the required amount of data to train a machine learning model can be minimized. We study the benefits of active learning using text data examples. We perform simulation studies that illustrate conditions where active learning can reduce the cost of labeling text data. We perform these simulations on three corpora that vary in size, document length, and domain. We find that in cases where the document class of interest is not balanced, researchers can label a fraction of the documents one would need using random sampling (or “passive” learning) to achieve equally performing classifiers. We further investigate how varying levels of intercoder reliability affect the active learning procedures and find that even with low reliability, active learning performs more efficiently than does random sampling.


2021 ◽  
Vol 11 ◽  
Author(s):  
Weixin Xie ◽  
Limei Wang ◽  
Qi Cheng ◽  
Xueying Wang ◽  
Ying Wang ◽  
...  

Clinical drug–drug interactions (DDIs) have been a major cause for not only medical error but also adverse drug events (ADEs). The published literature on DDI clinical toxicity continues to grow significantly, and high-performance DDI information retrieval (IR) text mining methods are in high demand. The effectiveness of IR and its machine learning (ML) algorithm depends on the availability of a large amount of training and validation data that have been manually reviewed and annotated. In this study, we investigated how active learning (AL) might improve ML performance in clinical safety DDI IR analysis. We recognized that a direct application of AL would not address several primary challenges in DDI IR from the literature. For instance, the vast majority of abstracts in PubMed will be negative, existing positive and negative labeled samples do not represent the general sample distributions, and potentially biased samples may arise during uncertainty sampling in an AL algorithm. Therefore, we developed several novel sampling and ML schemes to improve AL performance in DDI IR analysis. In particular, random negative sampling was added as a part of AL since it has no expanse in the manual data label. We also used two ML algorithms in an AL process to differentiate random negative samples from manually labeled negative samples, and updated both the training and validation samples during the AL process to avoid or reduce biased sampling. Two supervised ML algorithms, support vector machine (SVM) and logistic regression (LR), were used to investigate the consistency of our proposed AL algorithm. Because the ultimate goal of clinical safety DDI IR is to retrieve all DDI toxicity–relevant abstracts, a recall rate of 0.99 was set in developing the AL methods. When we used our newly proposed AL method with SVM, the precision in differentiating the positive samples from manually labeled negative samples improved from 0.45 in the first round to 0.83 in the second round, and the precision in differentiating the positive samples from random negative samples improved from 0.70 to 0.82 in the first and second rounds, respectively. When our proposed AL method was used with LR, the improvements in precision followed a similar trend. However, the other AL algorithms tested did not show improved precision largely because of biased samples caused by the uncertainty sampling or differences between training and validation data sets.


Forests ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 118 ◽  
Author(s):  
Viet-Hung Dang ◽  
Nhat-Duc Hoang ◽  
Le-Mai-Duyen Nguyen ◽  
Dieu Tien Bui ◽  
Pijush Samui

This study developed and verified a new hybrid machine learning model, named random forest machine (RFM), for the spatial prediction of shallow landslides. RFM is a hybridization of two state-of-the-art machine learning algorithms, random forest classifier (RFC) and support vector machine (SVM), in which RFC is used to generate subsets from training data and SVM is used to build decision functions for these subsets. To construct and verify the hybrid RFM model, a shallow landslide database of the Lang Son area (northern Vietnam) was prepared. The database consisted of 101 shallow landslide polygons and 14 conditioning factors. The relevance of these factors for shallow landslide susceptibility modeling was assessed using the ReliefF method. Experimental results pointed out that the proposed RFM can help to achieve the desired prediction with an F1 score of roughly 0.96. The performance of the RFM was better than those of benchmark approaches, including the SVM, RFC, and logistic regression. Thus, the newly developed RFM is a promising tool to help local authorities in shallow landslide hazard mitigations.


2020 ◽  
Vol 12 (24) ◽  
pp. 4015
Author(s):  
Yuzhen Zhang ◽  
Jun Ma ◽  
Shunlin Liang ◽  
Xisheng Li ◽  
Manyao Li

This study provided a comprehensive evaluation of eight machine learning regression algorithms for forest aboveground biomass (AGB) estimation from satellite data based on leaf area index, canopy height, net primary production, and tree cover data, as well as climatic and topographical data. Some of these algorithms have not been commonly used for forest AGB estimation such as the extremely randomized trees, stochastic gradient boosting, and categorical boosting (CatBoost) regression. For each algorithm, its hyperparameters were optimized using grid search with cross-validation, and the optimal AGB model was developed using the training dataset (80%) and AGB was predicted on the test dataset (20%). Performance metrics, feature importance as well as overestimation and underestimation were considered as indicators for evaluating the performance of an algorithm. To reduce the impacts of the random training-test data split and sampling method on the performance, the above procedures were repeated 50 times for each algorithm under the random sampling, the stratified sampling, and separate modeling scenarios. The results showed that five tree-based ensemble algorithms performed better than the three nonensemble algorithms (multivariate adaptive regression splines, support vector regression, and multilayer perceptron), and the CatBoost algorithm outperformed the other algorithms for AGB estimation. Compared with the random sampling scenario, the stratified sampling scenario and separate modeling did not significantly improve the AGB estimates, but modeling AGB for each forest type separately provided stable results in terms of the contributions of the predictor variables to the AGB estimates. All the algorithms showed forest AGB were underestimated when the AGB values were larger than 210 Mg/ha and overestimated when the AGB values were less than 120 Mg/ha. This study highlighted the capability of ensemble algorithms to improve AGB estimates and the necessity of improving AGB estimates for high and low AGB levels in future studies.


Sign in / Sign up

Export Citation Format

Share Document