scholarly journals Counteracting flawed landslide data in statistically based landslide susceptibility modelling for very large areas: a national-scale assessment for Austria

Landslides ◽  
2021 ◽  
Author(s):  
Pedro Lima ◽  
Stefan Steger ◽  
Thomas Glade

AbstractThe reliability of input data to be used within statistically based landslide susceptibility models usually determines the quality of the resulting maps. For very large territories, landslide susceptibility assessments are commonly built upon spatially incomplete and positionally inaccurate landslide information. The unavailability of flawless input data is contrasted by the need to identify landslide-prone terrain at such spatial scales. Instead of simply ignoring errors in the landslide data, we argue that modellers have to explicitly adopt their modelling design to avoid misleading results. This study examined different modelling strategies to reduce undesirable effects of error-prone landslide inventory data, namely systematic spatial incompleteness and positional inaccuracies. For this purpose, the Austrian territory with its abundant but heterogeneous landslide data was selected as a study site. Conventional modelling practices were compared with alternative modelling designs to elucidate whether an active counterbalancing of flawed landslide information can improve the modelling results. In this context, we compared widely applied logistic regression with an approach that allows minimizing the effects of heterogeneously complete landslide information (i.e. mixed-effects logistic regression). The challenge of positionally inaccurate landslide samples was tackled by elaborating and comparing the models for different terrain representations, namely grid cells, and slope units. The results showed that conventional logistic regression tended to reproduce incompleteness inherent in landslide training data in case the underlying model relied on explanatory variables directly related to the data bias. The adoption of a mixed-effects modelling approach appeared to reduce these undesired effects and led to geomorphologically more coherent spatial predictions. As a consequence of their larger spatial extent, the slope unit–based models were able to better cope with positional inaccuracies of the landslide data compared to their grid-based equals. The presented research demonstrates that in the context of very large area susceptibility modelling (i) ignoring flaws in available landslide data can lead to geomorphically incoherent results despite an apparent high statistical performance and that (ii) landslide data imperfections can actively be diminished by adjusting the research design according to the respective input data imperfections.

2020 ◽  
Author(s):  
Stefan Steger ◽  
Volkmar Mair ◽  
Christian Kofler ◽  
Stefan Schneiderbauer ◽  
Marc Zebisch

<p>Most statistically-based landslide susceptibility maps are supposed to portray the relative likelihood of an area to be affected by future landslides. Literature indicates that vital modelling decisions, such as the selection of explanatory variables, are frequently based on quantitative criteria (e.g. predictive performance). The results obtained by apparently well-performing statistical models are also used to infer the causes of slope instability and to identify landslide “safe” terrain. It seems that comparably few studies pay particular attention to background information associated with the available landslide data. This research hypothesizes that inappropriate modelling decisions and wrong conclusions are likely to follow whenever the origin of the underlying landslide data is ignored. The aims were to (i) analyze the South Tyrolean landslide inventory in the context of its origin in order to (ii) highlight potential pitfalls of performance driven procedures and to (iii) develop a predictive model that takes landslide background information into account. The available landslide data (1928 slide-type movements) of the province of South Tyrol (~7400 km²) consists of positionally accurate points that depict the scarp location of events that induced interventions by e.g. the road service or the geological office. An initial exploratory statistical analysis revealed general relationships between landslide presence/absence data and frequently used explanatory variables. Subsequent modelling was based on a Generalized Additive Mixed Effects Model that allowed accounting for (non-linear) fixed effects and additional “nuisance” variables (random intercepts). The evaluation of the models (diverse variable combinations) focused on modelled relationships, variable importance, spatial and non-spatial predictive performance and the final prediction surfaces. The results highlighted that the best performing models did not reflect the “actual” landslide susceptibility situation. A critical interpretation led to the conclusion that the models simultaneously reflected both, effects likely related to slope instability (e.g. low likelihood of flat and very steep terrain) and effects rather associated with the provincial landslide intervention strategy (e.g. few interventions at high altitudes, increasing number of interventions with decreasing distance to infrastructure). Attempts to separate the nuisance related to “intervention effects” from the actual landslide effects using mixed effects modelling proved to be challenging, also due to omnipresent spatial interrelations among the explanatory variables and the fact that some variables concurrently represent effects related to landslide predisposition and effects associated with the intervention strategy (e.g. altitude). We developed a well-performing predictive landslide intervention index that is in line with the actual data origin and allows identifying areas where future interventions are more or less likely to take place. The efficiency of past interventions (e.g. stabilization of slopes) was demonstrated during recent storm events, because previously stabilized slopes were not affected by new landslides. This also showed that the correct interpretation of the final map requires a simultaneous visualization of both, the spatially predicted index (from low to high) and the available landslide inventory (low likelihood due to past interventions). The results confirm that wrong conclusions can be drawn from excellently performing statistical models whenever qualitative background information is disregarded.</p>


Entropy ◽  
2019 ◽  
Vol 21 (2) ◽  
pp. 218 ◽  
Author(s):  
Tingyu Zhang ◽  
Ling Han ◽  
Jichang Han ◽  
Xian Li ◽  
Heng Zhang ◽  
...  

The main aim of this study was to compare and evaluate the performance of fractal dimension as input data in the landslide susceptibility mapping of the Baota District, Yan’an City, China. First, a total of 632 points, including 316 landslide points and 316 non-landslide points, were located in the landslide inventory map. All points were divided into two parts according to the ratio of 70%:30%, with 70% (442) of the points used as the training dataset to train the models, and the remaining, namely the validation dataset, applied for validation. Second, 13 predisposing factors, including slope aspect, slope angle, altitude, lithology, mean annual precipitation (MAP), distance to rivers, distance to faults, distance to roads, normalized differential vegetation index (NDVI), topographic wetness index (TWI), plan curvature, profile curvature, and terrain roughness index (TRI), were selected. Then, the original numerical data, box-counting dimension, and correlation dimension corresponding to each predisposing factor were calculated to generate the input data and build three classification models, namely the kernel logistic regression model (KLR), kernel logistic regression based on box-counting dimension model (KLRbox-counting), and the kernel logistic regression based on correlation dimension model (KLRcorrelation). Next, the statistical indexes and the receiver operating characteristic (ROC) curve were employed to evaluate the models’ performance. Finally, the KLRcorrelation model had the highest area under the curve (AUC) values of 0.8984 and 0.9224, obtained by the training and validation datasets, respectively, indicating that the fractal dimension can be used as the input data for landslide susceptibility mapping with a better effect.


2020 ◽  
Vol 12 (3) ◽  
pp. 486 ◽  
Author(s):  
Aaron E. Maxwell ◽  
Maneesh Sharma ◽  
James S. Kite ◽  
Kurt A. Donaldson ◽  
James A. Thompson ◽  
...  

The probabilistic mapping of landslide occurrence at a high spatial resolution and over a large geographic extent is explored using random forests (RF) machine learning; light detection and ranging (LiDAR)-derived terrain variables; additional variables relating to lithology, soils, distance to roads and streams and cost distance to roads and streams; and training data interpreted from high spatial resolution LiDAR-derivatives. Using a large training set and all predictor variables, an area under the receiver operating characteristic (ROC) curve (AUC) of 0.946 is obtained. Our findings highlight the value of a large training dataset, the incorporation of a variety of terrain variables and the use of variable window sizes to characterize the landscape at different spatial scales. We also document important variables for mapping slope failures. Our results suggest that feature selection is not required to improve the RF modeling results and that incorporating multiple models using different pseudo absence samples is not necessary. From our findings and based on a review of prior studies, we make recommendations for high spatial resolution, large-area slope failure probabilistic mapping.


Author(s):  
Elaine C Khoong ◽  
Valy Fontil ◽  
Natalie A Rivadeneira ◽  
Mekhala Hoskote ◽  
Shantanu Nundy ◽  
...  

Abstract Objective The study sought to evaluate if peer input on outpatient cases impacted diagnostic confidence. Materials and Methods This randomized trial of a peer input intervention occurred among 28 clinicians with case-level randomization. Encounters with diagnostic uncertainty were entered onto a digital platform to collect input from ≥5 clinicians. The primary outcome was diagnostic confidence. We used mixed-effects logistic regression analyses to assess for intervention impact on diagnostic confidence. Results Among the 509 cases (255 control; 254 intervention), the intervention did not impact confidence (odds ratio [OR], 1.46; 95% confidence interval [CI], 0.999-2.12), but after adjusting for clinician and case traits, the intervention was associated with higher confidence (OR, 1.53; 95% CI, 1.01-2.32). The intervention impact was greater in cases with high uncertainty (OR, 3.23; 95% CI, 1.09- 9.52). Conclusions Peer input increased diagnostic confidence primarily in high-uncertainty cases, consistent with findings that clinicians desire input primarily in cases with continued uncertainty.


2020 ◽  
Vol 7 (Supplement_1) ◽  
pp. S375-S376
Author(s):  
ljubomir Buturovic ◽  
Purvesh Khatri ◽  
Benjamin Tang ◽  
Kevin Lai ◽  
Win Sen Kuan ◽  
...  

Abstract Background While major progress has been made to establish diagnostic tools for the diagnosis of SARS-CoV-2 infection, determining the severity of COVID-19 remains an unmet medical need. With limited hospital resources, gauging severity would allow for some patients to safely recover in home quarantine while ensuring sicker patients get needed care. We discovered a 5 host mRNA-based classifier for the severity of influenza and other acute viral infections and validated the classifier in COVID-19 patients from Greece. Methods We used training data (N=705) from 21 retrospective clinical studies of influenza and other viral illnesses. Five host mRNAs from a preselected panel were applied to train a logistic regression classifier for predicting 30-day mortality in influenza and other viral illnesses. We then applied this classifier, with fixed weights, to an independent cohort of subjects with confirmed COVID-19 from Athens, Greece (N=71) using NanoString nCounter. Finally, we developed a proof-of-concept rapid, isothermal qRT-LAMP assay for the 5-mRNA host signature using the QuantStudio 6 qPCR platform. Results In 71 patients with COVID-19, the 5 mRNA classifier had an AUROC of 0.88 (95% CI 0.80-0.97) for identifying patients with severe respiratory failure and/or 30-day mortality (Figure 1). Applying a preset cutoff based on training data, the 5-mRNA classifier had 100% sensitivity and 46% specificity for identifying mortality, and 88% sensitivity and 68% specificity for identifying severe respiratory failure. Finally, our proof-of-concept qRT-LAMP assay showed high correlation with the reference NanoString 5-mRNA classifier (r=0.95). Figure 1. Validation of the 5-mRNA classifier in the COVID-19 cohort. (A) Expression of the 5 genes used in the logistic regression model in patients with (red) and without (blue) mortality. (B) The 5-mRNA classifier accurately distinguishes non-severe and severe patients with COVID-19 as well as those at risk of death. Conclusion Our 5-mRNA classifier demonstrated very high accuracy for the prediction of COVID-19 severity and could assist in the rapid, point-of-impact assessment of patients with confirmed COVID-19 to determine level of care thereby improving patient management and healthcare burden. Disclosures ljubomir Buturovic, PhD, Inflammatix Inc. (Employee, Shareholder) Purvesh Khatri, PhD, Inflammatix Inc. (Shareholder) Oliver Liesenfeld, MD, Inflammatix Inc. (Employee, Shareholder) James Wacker, n/a, Inflammatix Inc. (Employee, Shareholder) Uros Midic, PhD, Inflammatix Inc. (Employee, Shareholder) Roland Luethy, PhD, Inflammatix Inc. (Employee, Shareholder) David C. Rawling, PhD, Inflammatix Inc. (Employee, Shareholder) Timothy Sweeney, MD, Inflammatix, Inc. (Employee)


2021 ◽  
Vol 11 (15) ◽  
pp. 7148
Author(s):  
Bedada Endale ◽  
Abera Tullu ◽  
Hayoung Shi ◽  
Beom-Soo Kang

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.


2020 ◽  
Vol 26 (2) ◽  
pp. 185-200
Author(s):  
Said Benchelha ◽  
Hasnaa Chennaoui Aoudjehane ◽  
Mustapha Hakdaoui ◽  
Rachid El Hamdouni ◽  
Hamou Mansouri ◽  
...  

ABSTRACT Landslide susceptibility indices were calculated and landslide susceptibility maps were generated for the Oudka, Morocco, study area using a geographic information system. The spatial database included current landslide location, topography, soil, hydrology, and lithology, and the eight factors related to landslides (elevation, slope, aspect, distance to streams, distance to roads, distance to faults, lithology, and Normalized Difference Vegetation Index [NDVI]) were calculated or extracted. Logistic regression (LR), multivariate adaptive regression spline (MARSpline), and Artificial Neural Networks (ANN) were the methods used in this study to generate landslide susceptibility indices. Before the calculation, the study area was randomly divided into two parts, the first for the establishment of the model and the second for its validation. The results of the landslide susceptibility analysis were verified using success and prediction rates. The MARSpline model gave a higher success rate (AUC (Area Under The Curve) = 0.963) and prediction rate (AUC = 0.951) than the LR model (AUC = 0.918 and AUC = 0.901) and the ANN model (AUC = 0.886 and AUC = 0.877). These results indicate that the MARSpline model is the best model for determining landslide susceptibility in the study area.


2020 ◽  
Vol 41 (S1) ◽  
pp. s396-s397
Author(s):  
Qunna Li ◽  
Minn Soe ◽  
Allan Nkwata ◽  
Victoria Russo ◽  
Margaret Dudeck ◽  
...  

Background: Surveillance data for surgical site infections (SSIs) following abdominal hysterectomy (HYST) have been reported to the CDC NHSN since 2005. Beginning in 2012, HYST SSI surveillance coverage expanded substantially as a result of a CMS mandatory reporting requirement as part of the Hospital Inpatient Quality Reporting Program. A trend analysis of HYST SSI using data submitted to the NHSN has not been previously reported. To estimate the overall trend of HYST SSI incidence rates, we analyzed data reported from acute-care hospitals with surgery performed between January 1, 2009, and December 31, 2018. Methods: We analyzed inpatient adult HYST procedures with primary closure resulting deep incisional primary and organ-space SSIs detected during the same hospitalization or rehospitalization to the same hospital. SSIs reported as infection present at time of surgery (PATOS) were included in the analysis. Due to the surveillance definition changes for primary closure in 2013 and 2015, these were tested separately as interruptions to HYST SSI outcome using an interrupted time-series model with a mixed-effects logistic regression. Because the previously described changes were not significantly associated with changes in HYST SSI risk, mixed-effects logistic regression was used to estimate the annual change in the log odds of HYST SSI. The estimates were adjusted for the following covariates: hospital bed size, general anesthesia, scope, ASA score, wound classification, medical school affiliation type, procedure duration and age. Results: The number of hospitals and procedures reported to NHSN for HYST increased and then stabilized after 2012 (Table 1). The unadjusted annual SSI incidence rates ranged from 0.60% to 0.81%. Based on the model, we estimate a 2.58% decrease in the odds of having a HYST SSI annually after controlling for variables mentioned above (Table 2). Conclusions: The volume of hospitals and procedures for HYST reported to NHSN increased substantially because of the CMS reporting requirement implemented in 2012. The overall adjusted HYST SSI odds ratio decreased annually over 2009–2018, which indicates progress in preventing HYST SSIs.Funding: NoneDisclosures: None


2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


Sign in / Sign up

Export Citation Format

Share Document