Predicting the outcrop of pre-Quaternary formations in the Dorog Basin (Hungary) using random forest classification

Mapping Intimacies ◽

10.5194/egusphere-egu2020-7255 ◽

2020 ◽

Author(s):

Reka Pogacsas ◽

Gaspar Albert

Keyword(s):

Remote Sensing ◽

Random Forest ◽

Slope Angle ◽

Morphological Characteristics ◽

Training Data ◽

Topographic Wetness Index ◽

Unique Region ◽

Random Forest Classification ◽

Forest Classification ◽

Geological Map

The Dorog Basin is a morphologically unique region of the Transdanubian Mountains revealing the combined work of tectonic forces and erosion. Overprinted by the forms of fluvial erosion, numerous NW-SE striking half-graben and horst structures are present. The surface is dominantly covered by lose 1&#8211;15 m thick Quaternary sediments (aeolian loess, and siliciclastic alluvial and coluvial formations), while the lithified bedrock consists of Mesozoic carbonates, Paleogene limestones, marls and sandstones and limnic coal sequences. The rheological difference of the Quaternary and pre-Quaternary formations is so pronounced that the morphological characteristics of the outcrops also differ significantly. The area was in the focus of geologists for many decades, due to its Eocene coal beds, and a renewal of the geological map of the region is in progress. The current research aims to assist the mapping with multivariate methods based on geomorphological attributes, such as slope angle, aspect, profile curvature, height, and topographic wetness index. We perform a random forest classification (RFC) using these variables, to predict the outcrops of pre-Quaternary formations in the study area.Random forest is a powerful tool for multivariate classification that uses several decision trees, each one with a prediction, where the most popular one will be the overall result [1]. The reason why it is getting popular in spatial predictions is the high accuracy to classify raster-type objects [2]. We used raster-type spatial data as subject of RFC predicting a result for each pixel. The geology of the study area was known from previous geological mapping [3]. Morphological information was derived from the MERIT DEM.Our model used a raster with multiple bands containing geomorphological variables, and training data from the digitalized geological map. The number of random samples of data was 2500. After testing several combinations of the bands, and several spacing of the study areas, the best prediction has cca. 80% accuracy. Model validation is based on the calculation of rates of well predicted pixels in the same rasterized geological map that was used for training. Our aim was to use exact data, which is completely true for remotely sensed images, but not for geological maps. That means the accuracy still can be improved by field perception, or from borehole data.&#160;References:[1] Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R news, 2(3), 18-22.[2] Belgiu, M., & Dr&#259;gu&#355;, L. (2016). Random forest in remote sensing: A review of applications and future directions. ISPRS Journal of Photogrammetry and Remote Sensing, 114, 24-31.[3] Gidai, L., Nagy, G., & Siposs, Z. (1981). Geological map of the Dorog Basin 1: 25 000. [in Hungarian] Geological Institute of Hungary, Budapest.

Download Full-text

Random forest classification of morphology in the northern Gerecse (Hungary) to predict landslide-prone slopes

10.5194/egusphere-egu2020-8365 ◽

2020 ◽

Author(s):

Gáspár Albert ◽

Dávid Gerzsenyi

Keyword(s):

Random Forest ◽

Decision Trees ◽

Training Data ◽

Danube River ◽

Predictor Variables ◽

Geological Data ◽

Random Forest Classification ◽

Fluvial Terraces ◽

Forest Classification ◽

Geological Map

The morphology of the Gerecse Hills bears the imprints of fluvial terraces of the Danube River, Neogene tectonism and Quaternary erosion. The solid bedrocks are composed of Mesozoic and Paleogene limestones, marls, and sandstones, and are covered by 115 m thick layers of unconsolidated Quaternary fluvial, lacustrine, and aeolian sediments. Hillslopes, stream valleys, and loessy riverside bluffs are prone to landslides, which caused serious damages in inhabited and agricultural areas in the past. Attempts to map these landslides were made and the observations were documented in the National Landslide Cadastre (NLC) inventory since the 1970&#8217;s. These documentations are sporadic, concentrating only on certain locations, and they often refer inaccurately to the state and extent of the landslides. The aim of the present study was to complete and correct the landslide inventory by using quantitative modelling. On the 480 sq. km large study area all records of the inventory were revisited and corrected. Using objective criteria, the renewed records and additional sample locations were sorted into one of the following morphological categories: scarps, debris, transitional area, stable accumulation areas, stable hilltops, and stable slopes. The categorized map of these observations served as training data for the random forest classification (RFC).Random forest is a powerful tool for multivariate classification that uses several decision trees. In our case, the predictions were done for each pixels of medium-resolution (~10 m) rasters. The predictor variables of the decision trees were morphometric and geological indices. The terrain indices were derived from the MERIT DEM with SAGA routines and the categorized geological data is from a medium-scale geological map [1]. The predictor variables were packed in a multi-band raster and the RFC method was executed using R 3.5 with RStudio.After testing several combinations of the predictor variables and two different categorisation of the geological data, the best prediction has cca. 80% accuracy. The validation of the model is based on the calculation of the rate of well-predicted pixels compared to the total cell-count of the training data. The results showed that the probable location of landslide-prone slopes is not restricted to the areas recorded in the National Landslide Cadastre inventory. Based on the model, only ~6% of the estimated location of the highly unstable slopes (scarps) fall within the NLC polygons in the study area.The project was partly supported by the Thematic Excellence Program, Industry and Digitization Subprogram, NRDI Office, project no. ED_18-1-2019-0030 (from the part of G. Albert) and the &#218;NKP-19-3 New National Excellence Program of the Ministry for Innovation and Technology (from the part of D. Gerzsenyi).Reference:[1] Gyalog L., and S&#237;khegyi F., eds. Geological map of Hungary (scale: 1:100 000). Budapest, Hungary, Geological Institute of Hungary, 2005.

Download Full-text

CHIRPS: Explaining random forest classification

Artificial Intelligence Review ◽

10.1007/s10462-020-09833-6 ◽

2020 ◽

Vol 53 (8) ◽

pp. 5747-5788

Author(s):

Julian Hatwell ◽

Mohamed Medhat Gaber ◽

R. Muhammad Atif Azad

Keyword(s):

Random Forest ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Training Data ◽

Frequent Pattern ◽

Data Sets ◽

Random Forest Classification ◽

Human In The Loop ◽

Forest Classification ◽

Unseen Data

Abstract Modern machine learning methods typically produce “black box” models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification. This rule is returned alongside estimates of the rule’s precision and coverage on the training data along with counter-factual details. An experimental study involving nine data sets shows that classification rules returned by CHIRPS have a precision at least as high as the state of the art when evaluated on unseen data (0.91–0.99) and offer a much greater coverage (0.04–0.54). Furthermore, CHIRPS uniquely controls against under- and over-fitting solutions by maximising novel objective functions that are better suited to the local (per instance) explanation setting.

Download Full-text

Application of random forest classification and remotely sensed data in geological mapping on the Jebel Meloussi area (Tunisia)

Arabian Journal of Geosciences ◽

10.1007/s12517-021-08509-x ◽

2021 ◽

Vol 14 (21) ◽

Author(s):

Gáspár Albert ◽

Seif Ammar

Keyword(s):

Random Forest ◽

Remotely Sensed ◽

Geological Mapping ◽

Vegetation Coverage ◽

Remotely Sensed Data ◽

Random Forest Classification ◽

Radar Images ◽

Forest Classification ◽

Geological Map ◽

Sentinel 2

Abstract Remotely sensed data such as satellite photos and radar images can be used to produce geological maps on arid regions, where the vegetation coverage does not have a significant effect. In central Tunisia, the Jebel Meloussi area has unique geological features and characteristic morphology (i.e. flat areas with dune fields in contrast with hills of folded and eroded stratigraphic sequences), which makes it an ideal area for testing new methods of automatic terrain classification. For this, data from the Sentinel 2 satellite sensor and the SRTM-based MERIT DEM (digital elevation model) were used in the present study. Using R scripts and the random forest classification method, modelling was performed on four lithological variables—derived from the different bands of the Sentinel 2 images—and two morphometric parameters for the area of the 1:50,000 geological map sheet no. 103. The four lithological variables were chosen to highlight the iron-bearing minerals since the spectral parameters of the Sentinel 2 sensors are especially useful for this purpose. The training areas of the classification were selected on the geological map. The results of the modelling identified Eocene and Cretaceous evaporite-bearing sedimentary series (such as the Jebs and the Bouhedma Formations) with the highest producer accuracy (> 60% of the predicted pixels match with the map). The pyritic argillites of the Sidi Khalif Formation were also recognized with the same accuracy, and the Quaternary sebhkas and dunes were also well predicted. The study concludes that the classification-based geological map is useful for field geologist prior to field surveys.

Download Full-text

Identifying European Old-Growth Forests using Remote Sensing: A Study in the Ukrainian Carpathians

Forests ◽

10.3390/f10020127 ◽

2019 ◽

Vol 10 (2) ◽

pp. 127 ◽

Cited By ~ 6

Author(s):

Benedict D. Spracklen ◽

Dominick V. Spracklen

Keyword(s):

Remote Sensing ◽

Random Forest ◽

Norway Spruce ◽

Satellite Images ◽

Textural Features ◽

Old Growth ◽

Random Forest Classification ◽

Forest Classification ◽

Old Growth Forests ◽

Sentinel 2

Old-growth forests are an important, rare and endangered habitat in Europe. The ability to identify old-growth forests through remote sensing would be helpful for both conservation and forest management. We used data on beech, Norway spruce and mountain pine old-growth forests in the Ukrainian Carpathians to test whether Sentinel-2 satellite images could be used to correctly identify these forests. We used summer and autumn 2017 Sentinel-2 satellite images comprising 10 and 20 m resolution bands to create 6 vegetation indices and 9 textural features. We used a Random Forest classification model to discriminate between dominant tree species within old-growth forests and between old-growth and other forest types. Beech and Norway spruce were identified with an overall accuracy of around 90%, with a lower performance for mountain pine (70%) and mixed forest (40%). Old-growth forests were identified with an overall classification accuracy of 85%. Adding textural features, band standard deviations and elevation data improved accuracies by 3.3%, 2.1% and 1.8% respectively, while using combined summer and autumn images increased accuracy by 1.2%. We conclude that Random Forest classification combined with Sentinel-2 images can provide an effective option for identifying old-growth forests in Europe.

Download Full-text

Machine Learning Techniques for Intrusion Detection

Handbook of Research on Intrusion Detection Systems - Advances in Information Security, Privacy, and Ethics ◽

10.4018/978-1-7998-2242-4.ch003 ◽

2020 ◽

pp. 47-65

Author(s):

Tameem Ahmad ◽

Mohd Asad Anwar ◽

Misbahul Haque

Keyword(s):

Random Forest ◽

Intrusion Detection ◽

False Alarm ◽

False Alarm Rate ◽

Detection Rate ◽

Clustering Algorithms ◽

Training Data ◽

Hybrid Classifier ◽

Random Forest Classification ◽

Forest Classification

This chapter proposes a hybrid classifier technique for network Intrusion Detection System by implementing a method that combines Random Forest classification technique with K-Means and Gaussian Mixture clustering algorithms. Random-forest will build patterns of intrusion over a training data in misuse-detection, while anomaly-detection intrusions will be identiðed by the outlier-detection mechanism. The implementation and simulation of the proposed method for various metrics are carried out under varying threshold values. The effectiveness of the proposed method has been carried out for metrics such as precision, recall, accuracy rate, false alarm rate, and detection rate. The various existing algorithms are analyzed extensively. It is observed experimentally that the proposed method gives superior results compared to the existing simpler classifiers as well as existing hybrid classifier techniques. The proposed hybrid classifier technique outperforms other common existing classifiers with an accuracy of 99.84%, false alarm rate as 0.09% and the detection rate as 99.7%.

Download Full-text