scholarly journals Identification of intrusive lithologies in volcanic terrains in British Columbia by machine learning using random forests: The value of using a soft classifier

Geophysics ◽  
2020 ◽  
Vol 85 (6) ◽  
pp. B249-B258
Author(s):  
Stephen Kuhn ◽  
Matthew J. Cracknell ◽  
Anya M. Reading ◽  
Stephanie Sykora

Identifying the location of intrusions is a key component in exploration for porphyry Cu ± Mo ± Au deposits. In typical porphyry terrains, in the absence of outcrop, intrusions can be difficult to discriminate from the compositionally similar volcanic and volcanoclastic sedimentary rocks in which they are emplaced. The ability to produce lithological maps at an early exploration stage can significantly reduce costs by assisting in planning and prioritization of detailed mapping and sampling. Additionally, a data-driven strategy provides opportunity for the discovery of intrusions not identified during conventional mapping and interpretation. We used random forests (RF), a supervised machine-learning algorithm, to classify rock types throughout the Kliyul porphyry prospect in British Columbia, Canada. Rock types determined at geochemical sampling sites were used as training data. Airborne magnetic and radiometric data, geochemistry, and topographic data were used in classification. Results were validated using First Quantum Minerals’ geologic map, which includes additional detail from targeted location and transect mapping. The petrophysical and compositional similarity of rock types resulted in a noisy classification. Intrusions, particularly the more discrete, were inconsistently predicted, likely due to their limited extent relative to data sampling intervals. Closer examination of class membership probabilities (CMPs) identified locations where the probability of an intrusion being present was elevated significantly above the background. Indeed, a large proportion of mapped intrusions correspond to areas of elevated probability and, importantly, areas were highlighted as potential intrusions that were not identified in geologic mapping. The RF classification produced a reasonable lithological map, if lacking in resolution, but more significantly, great benefit comes from the insights drawn from the RF CMPs. Mapping the spatial distribution of elevated intrusion CMP, a soft classifier approach, produced a map product that can target intrusions and prioritize detailed mapping for mineral exploration.

Author(s):  
Kazuko Fuchi ◽  
Eric M. Wolf ◽  
David S. Makhija ◽  
Nathan A. Wukie ◽  
Christopher R. Schrock ◽  
...  

Abstract A machine learning algorithm that performs multifidelity domain decomposition is introduced. While the design of complex systems can be facilitated by numerical simulations, the determination of appropriate physics couplings and levels of model fidelity can be challenging. The proposed method automatically divides the computational domain into subregions and assigns required fidelity level, using a small number of high fidelity simulations to generate training data and low fidelity solutions as input data. Unsupervised and supervised machine learning algorithms are used to correlate features from low fidelity solutions to fidelity assignment. The effectiveness of the method is demonstrated in a problem of viscous fluid flow around a cylinder at Re ≈ 20. Ling et al. built physics-informed invariance and symmetry properties into machine learning models and demonstrated improved model generalizability. Along these lines, we avoid using problem dependent features such as coordinates of sample points, object geometry or flow conditions as explicit inputs to the machine learning model. Use of pointwise flow features generates large data sets from only one or two high fidelity simulations, and the fidelity predictor model achieved 99.5% accuracy at training points. The trained model was shown to be capable of predicting a fidelity map for a problem with an altered cylinder radius. A significant improvement in the prediction performance was seen when inputs are expanded to include multiscale features that incorporate neighborhood information.


2020 ◽  
Vol 222 (3) ◽  
pp. 1750-1764 ◽  
Author(s):  
Yangkang Chen

SUMMARY Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.


Author(s):  
Anisha M. Lal ◽  
B. Koushik Reddy ◽  
Aju D.

Machine learning can be defined as the ability of a computer to learn and solve a problem without being explicitly coded. The efficiency of the program increases with experience through the task specified. In traditional programming, the program and the input are specified to get the output, but in the case of machine learning, the targets and predictors are provided to the algorithm make the process trained. This chapter focuses on various machine learning techniques and their performance with commonly used datasets. A supervised learning algorithm consists of a target variable that is to be predicted from a given set of predictors. Using these established targets is a function that plots targets to a given set of predictors. The training process allows the system to train the unknown data and continues until the model achieves a desired level of accuracy on the training data. The supervised methods can be usually categorized as classification and regression. This chapter discourses some of the popular supervised machine learning algorithms and their performances using quotidian datasets. This chapter also discusses some of the non-linear regression techniques and some insights on deep learning with respect to object recognition.


2019 ◽  
Author(s):  
Andrew Medford ◽  
Shengchun Yang ◽  
Fuzhu Liu

Understanding the interaction of multiple types of adsorbate molecules on solid surfaces is crucial to establishing the stability of catalysts under various chemical environments. Computational studies on the high coverage and mixed coverages of reaction intermediates are still challenging, especially for transition-metal compounds. In this work, we present a framework to predict differential adsorption energies and identify low-energy structures under high- and mixed-adsorbate coverages on oxide materials. The approach uses Gaussian process machine-learning models with quantified uncertainty in conjunction with an iterative training algorithm to actively identify the training set. The framework is demonstrated for the mixed adsorption of CH<sub>x</sub>, NH<sub>x</sub> and OH<sub>x</sub> species on the oxygen vacancy and pristine rutile TiO<sub>2</sub>(110) surface sites. The results indicate that the proposed algorithm is highly efficient at identifying the most valuable training data, and is able to predict differential adsorption energies with a mean absolute error of ~0.3 eV based on <25% of the total DFT data. The algorithm is also used to identify 76% of the low-energy structures based on <30% of the total DFT data, enabling construction of surface phase diagrams that account for high and mixed coverage as a function of the chemical potential of C, H, O, and N. Furthermore, the computational scaling indicates the algorithm scales nearly linearly (N<sup>1.12</sup>) as the number of adsorbates increases. This framework can be directly extended to metals, metal oxides, and other materials, providing a practical route toward the investigation of the behavior of catalysts under high-coverage conditions.


2019 ◽  
Vol 23 (1) ◽  
pp. 12-21 ◽  
Author(s):  
Shikha N. Khera ◽  
Divya

Information technology (IT) industry in India has been facing a systemic issue of high attrition in the past few years, resulting in monetary and knowledge-based loses to the companies. The aim of this research is to develop a model to predict employee attrition and provide the organizations opportunities to address any issue and improve retention. Predictive model was developed based on supervised machine learning algorithm, support vector machine (SVM). Archival employee data (consisting of 22 input features) were collected from Human Resource databases of three IT companies in India, including their employment status (response variable) at the time of collection. Accuracy results from the confusion matrix for the SVM model showed that the model has an accuracy of 85 per cent. Also, results show that the model performs better in predicting who will leave the firm as compared to predicting who will not leave the company.


2021 ◽  
Vol 11 (3) ◽  
pp. 92
Author(s):  
Mehdi Berriri ◽  
Sofiane Djema ◽  
Gaëtan Rey ◽  
Christel Dartigues-Pallez

Today, many students are moving towards higher education courses that do not suit them and end up failing. The purpose of this study is to help provide counselors with better knowledge so that they can offer future students courses corresponding to their profile. The second objective is to allow the teaching staff to propose training courses adapted to students by anticipating their possible difficulties. This is possible thanks to a machine learning algorithm called Random Forest, allowing for the classification of the students depending on their results. We had to process data, generate models using our algorithm, and cross the results obtained to have a better final prediction. We tested our method on different use cases, from two classes to five classes. These sets of classes represent the different intervals with an average ranging from 0 to 20. Thus, an accuracy of 75% was achieved with a set of five classes and up to 85% for sets of two and three classes.


Friction ◽  
2021 ◽  
Author(s):  
Vigneashwara Pandiyan ◽  
Josef Prost ◽  
Georg Vorlaufer ◽  
Markus Varga ◽  
Kilian Wasmer

AbstractFunctional surfaces in relative contact and motion are prone to wear and tear, resulting in loss of efficiency and performance of the workpieces/machines. Wear occurs in the form of adhesion, abrasion, scuffing, galling, and scoring between contacts. However, the rate of the wear phenomenon depends primarily on the physical properties and the surrounding environment. Monitoring the integrity of surfaces by offline inspections leads to significant wasted machine time. A potential alternate option to offline inspection currently practiced in industries is the analysis of sensors signatures capable of capturing the wear state and correlating it with the wear phenomenon, followed by in situ classification using a state-of-the-art machine learning (ML) algorithm. Though this technique is better than offline inspection, it possesses inherent disadvantages for training the ML models. Ideally, supervised training of ML models requires the datasets considered for the classification to be of equal weightage to avoid biasing. The collection of such a dataset is very cumbersome and expensive in practice, as in real industrial applications, the malfunction period is minimal compared to normal operation. Furthermore, classification models would not classify new wear phenomena from the normal regime if they are unfamiliar. As a promising alternative, in this work, we propose a methodology able to differentiate the abnormal regimes, i.e., wear phenomenon regimes, from the normal regime. This is carried out by familiarizing the ML algorithms only with the distribution of the acoustic emission (AE) signals captured using a microphone related to the normal regime. As a result, the ML algorithms would be able to detect whether some overlaps exist with the learnt distributions when a new, unseen signal arrives. To achieve this goal, a generative convolutional neural network (CNN) architecture based on variational auto encoder (VAE) is built and trained. During the validation procedure of the proposed CNN architectures, we were capable of identifying acoustics signals corresponding to the normal and abnormal wear regime with an accuracy of 97% and 80%. Hence, our approach shows very promising results for in situ and real-time condition monitoring or even wear prediction in tribological applications.


2021 ◽  
Vol 13 (3) ◽  
pp. 368
Author(s):  
Christopher A. Ramezan ◽  
Timothy A. Warner ◽  
Aaron E. Maxwell ◽  
Bradley S. Price

The size of the training data set is a major determinant of classification accuracy. Nevertheless, the collection of a large training data set for supervised classifiers can be a challenge, especially for studies covering a large area, which may be typical of many real-world applied projects. This work investigates how variations in training set size, ranging from a large sample size (n = 10,000) to a very small sample size (n = 40), affect the performance of six supervised machine-learning algorithms applied to classify large-area high-spatial-resolution (HR) (1–5 m) remotely sensed data within the context of a geographic object-based image analysis (GEOBIA) approach. GEOBIA, in which adjacent similar pixels are grouped into image-objects that form the unit of the classification, offers the potential benefit of allowing multiple additional variables, such as measures of object geometry and texture, thus increasing the dimensionality of the classification input data. The six supervised machine-learning algorithms are support vector machines (SVM), random forests (RF), k-nearest neighbors (k-NN), single-layer perceptron neural networks (NEU), learning vector quantization (LVQ), and gradient-boosted trees (GBM). RF, the algorithm with the highest overall accuracy, was notable for its negligible decrease in overall accuracy, 1.0%, when training sample size decreased from 10,000 to 315 samples. GBM provided similar overall accuracy to RF; however, the algorithm was very expensive in terms of training time and computational resources, especially with large training sets. In contrast to RF and GBM, NEU, and SVM were particularly sensitive to decreasing sample size, with NEU classifications generally producing overall accuracies that were on average slightly higher than SVM classifications for larger sample sizes, but lower than SVM for the smallest sample sizes. NEU however required a longer processing time. The k-NN classifier saw less of a drop in overall accuracy than NEU and SVM as training set size decreased; however, the overall accuracies of k-NN were typically less than RF, NEU, and SVM classifiers. LVQ generally had the lowest overall accuracy of all six methods, but was relatively insensitive to sample size, down to the smallest sample sizes. Overall, due to its relatively high accuracy with small training sample sets, and minimal variations in overall accuracy between very large and small sample sets, as well as relatively short processing time, RF was a good classifier for large-area land-cover classifications of HR remotely sensed data, especially when training data are scarce. However, as performance of different supervised classifiers varies in response to training set size, investigating multiple classification algorithms is recommended to achieve optimal accuracy for a project.


Genes ◽  
2021 ◽  
Vol 12 (4) ◽  
pp. 527
Author(s):  
Eran Elhaik ◽  
Dan Graur

In the last 15 years or so, soft selective sweep mechanisms have been catapulted from a curiosity of little evolutionary importance to a ubiquitous mechanism claimed to explain most adaptive evolution and, in some cases, most evolution. This transformation was aided by a series of articles by Daniel Schrider and Andrew Kern. Within this series, a paper entitled “Soft sweeps are the dominant mode of adaptation in the human genome” (Schrider and Kern, Mol. Biol. Evolut. 2017, 34(8), 1863–1877) attracted a great deal of attention, in particular in conjunction with another paper (Kern and Hahn, Mol. Biol. Evolut. 2018, 35(6), 1366–1371), for purporting to discredit the Neutral Theory of Molecular Evolution (Kimura 1968). Here, we address an alleged novelty in Schrider and Kern’s paper, i.e., the claim that their study involved an artificial intelligence technique called supervised machine learning (SML). SML is predicated upon the existence of a training dataset in which the correspondence between the input and output is known empirically to be true. Curiously, Schrider and Kern did not possess a training dataset of genomic segments known a priori to have evolved either neutrally or through soft or hard selective sweeps. Thus, their claim of using SML is thoroughly and utterly misleading. In the absence of legitimate training datasets, Schrider and Kern used: (1) simulations that employ many manipulatable variables and (2) a system of data cherry-picking rivaling the worst excesses in the literature. These two factors, in addition to the lack of negative controls and the irreproducibility of their results due to incomplete methodological detail, lead us to conclude that all evolutionary inferences derived from so-called SML algorithms (e.g., S/HIC) should be taken with a huge shovel of salt.


Sign in / Sign up

Export Citation Format

Share Document