Automatic classification of landslide kinematics using acoustic emission measurements and machine learning

AbstractFounded on understanding of a slope’s likely failure mechanism, an early warning system for instability should alert users of accelerating slope deformation behaviour to enable safety-critical decisions to be made. Acoustic emission (AE) monitoring of active waveguides (i.e. a steel tube with granular internal/external backfill installed through a slope) is becoming an accepted monitoring technology for soil slope stability applications; however, challenges still exist to develop widely applicable AE interpretation strategies. The objective of this study was to develop and demonstrate the use of machine learning (ML) approaches to automatically classify landslide kinematics using AE measurements, based on the standard landslide velocity scale. Datasets from large-scale slope failure simulation experiments were used to train and test the ML models. In addition, an example field application using data from a reactivated landslide at Hollin Hill, North Yorkshire, UK, is presented. The results show that ML can automatically classify landslide kinematics using AE measurements with the accuracy of more than 90%. The combination of two AE features, AE rate and AE rate gradient, enable both velocity and acceleration classifications. A conceptual framework is presented for how this automatic approach would be used for landslide early warning in the field, with considerations given to potentially limited site-specific training data.

Download Full-text

Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Applied Sciences ◽

10.3390/app11020472 ◽

2021 ◽

Vol 11 (2) ◽

pp. 472

Author(s):

Hyeongmin Cho ◽

Sangkyun Lee

Keyword(s):

Machine Learning ◽

Data Quality ◽

Large Scale ◽

High Dimensional Data ◽

Quality Measures ◽

Training Data ◽

Measure Data ◽

High Dimensional ◽

Small Scale ◽

Class Separability

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

Download Full-text

57 Precision neoantigen discovery using novel algorithms and expanded HLA-ligandome datasets

Journal for ImmunoTherapy of Cancer ◽

10.1136/jitc-2020-sitc2020.0057 ◽

2020 ◽

Vol 8 (Suppl 3) ◽

pp. A62-A62

Author(s):

Dattatreya Mellacheruvu ◽

Rachel Pyke ◽

Charles Abbott ◽

Nick Phillips ◽

Sejal Desai ◽

...

Keyword(s):

Machine Learning ◽

Cell Lines ◽

Antigen Processing ◽

Large Scale ◽

Prediction Models ◽

K562 Cells ◽

Machine Learning Algorithms ◽

Training Data ◽

High Quality ◽

Tissue Samples

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.

Download Full-text

A Physics-Infused Deep Learning Model for the Prediction of Refractive Indices and Its Use for the Large-Scale Screening of Organic Compound Space

10.26434/chemrxiv.8796950 ◽

2019 ◽

Author(s):

Mojtaba Haghighatlari ◽

Gaurav Vishwakarma ◽

Mohammad Atif Faiz Afzal ◽

Johannes Hachmann

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Large Scale ◽

Organic Molecules ◽

Learning Model ◽

Training Data ◽

Refractive Indices ◽

Learning Models ◽

Deep Learning Model ◽

Machine Learning Models

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>

Download Full-text

Detecting Pressure Anomalies While Drilling Using a Machine Learning Hybrid Approach

10.2118/204035-ms ◽

2021 ◽

Author(s):

Aurore Lafond ◽

Maurice Ringer ◽

Florian Le Blay ◽

Jiaxu Liu ◽

Ekaterina Millan ◽

...

Keyword(s):

Machine Learning ◽

Data Quality ◽

Real Time ◽

Large Scale ◽

Hybrid Approach ◽

Physical Models ◽

Training Data ◽

Digital Data ◽

Machine Learning Techniques ◽

New System

Abstract Abnormal surface pressure is typically the first indicator of a number of problematic events, including kicks, losses, washouts and stuck pipe. These events account for 60–70% of all drilling-related nonproductive time, so their early and accurate detection has the potential to save the industry billions of dollars. Detecting these events today requires an expert user watching multiple curves, which can be costly, and subject to human errors. The solution presented in this paper is aiming at augmenting traditional models with new machine learning techniques, which enable to detect these events automatically and help the monitoring of the drilling well. Today’s real-time monitoring systems employ complex physical models to estimate surface standpipe pressure while drilling. These require many inputs and are difficult to calibrate. Machine learning is an alternative method to predict pump pressure, but this alone needs significant labelled training data, which is often lacking in the drilling world. The new system combines these approaches: a machine learning framework is used to enable automated learning while the physical models work to compensate any gaps in the training data. The system uses only standard surface measurements, is fully automated, and is continuously retrained while drilling to ensure the most accurate pressure prediction. In addition, a stochastic (Bayesian) machine learning technique is used, which enables not only a prediction of the pressure, but also the uncertainty and confidence of this prediction. Last, the new system includes a data quality control workflow. It discards periods of low data quality for the pressure anomaly detection and enables to have a smarter real-time events analysis. The new system has been tested on historical wells using a new test and validation framework. The framework runs the system automatically on large volumes of both historical and simulated data, to enable cross-referencing the results with observations. In this paper, we show the results of the automated test framework as well as the capabilities of the new system in two specific case studies, one on land and another offshore. Moreover, large scale statistics enlighten the reliability and the efficiency of this new detection workflow. The new system builds on the trend in our industry to better capture and utilize digital data for optimizing drilling.

Download Full-text

Quantification of landslide velocity from active waveguide–generated acoustic emission

Canadian Geotechnical Journal ◽

10.1139/cgj-2014-0226 ◽

2015 ◽

Vol 52 (4) ◽

pp. 413-425 ◽

Cited By ~ 22

Author(s):

Alister Smith ◽

Neil Dixon

Keyword(s):

Acoustic Emission ◽

Slope Failure ◽

Deformation Behaviour ◽

Slope Instability ◽

Metal Waveguide ◽

Soil Slopes ◽

Order Of Magnitude ◽

Active Waveguide ◽

The Relationship ◽

Better Than

Acoustic emission (AE) has become an established approach to monitor stability of soil slopes. However, the challenge has been to develop strategies to interpret and quantify deformation behaviour from the measured AE. AE monitoring of soil slopes commonly utilizes an active waveguide that is installed in a borehole through the slope and comprises a metal waveguide rod or tube with a granular backfill surround. When the host slope deforms, the column of granular backfill also deforms and this generates AE that can propagate along the waveguide. Results from the commissioning of dynamic shear apparatus used to subject full-scale active waveguide models to simulated slope movements are presented. The results confirm that AE rates generated are proportional to the rate of deformation, and the coefficient of proportionality that defines the relationship has been quantified (e.g., 4.4 × 105 for the angular gravel examined). It is demonstrated that slope velocities can be quantified continuously in real time through monitoring active waveguide–generated AE during a slope failure simulation. The results show that the technique can quantify landslide velocity to better than an order of magnitude (i.e., consistent with standard landslide movement classification) and can therefore be used to provide an early warning of slope instability through detecting and quantifying accelerations of slope movement.

Download Full-text

Prediction of Thermal Properties of Zeolites through Machine Learning

10.26434/chemrxiv-2021-m67lk-v3 ◽

2022 ◽

Author(s):

Maxime Ducamp ◽

François-Xavier Coudert

Keyword(s):

Machine Learning ◽

Thermal Properties ◽

Large Scale ◽

Materials Science ◽

Pore Space ◽

Harmonic Approximation ◽

Chemical Properties ◽

Training Data ◽

Gradient Boosting ◽

Geometric Descriptors

The use of machine learning for the prediction of physical and chemical properties of crystals based on their structure alone is currently an area of intense research in computational materials science. In this work, we studied the possibility of using machine learning-trained algorithms in order to calculate the thermal properties of siliceous zeolite frameworks. We used as training data the thermal properties of 120 zeolites, calculated at the DFT level, in the quasi-harmonic approximation. We compared the statistical accuracy of trained models (based on the gradient boosting regression technique) using different types of descriptors, including ad hoc geometrical features, topology, pore space, and general geometric descriptors. While geometric descriptors were found to perform best, we also identified limitations on the accuracy of the predictions, especially for a small group of materials with very highly negative thermal expansion coefficients. We then studied the generalizability of the technique, demonstrating that the predictions were not sensitive to the refinement of framework structures at a high level of theory. Therefore, the models are suitable for the exploration and screening of large-scale databases of hypothetical frameworks, which we illustrate on the PCOD2 database of zeolites containing around 600,000 hypothetical structures.

Download Full-text

Predicting in-hospital mortality in adult non-traumatic emergency department patients: a retrospective comparison of the Modified Early Warning Score (MEWS) and machine learning approach

PeerJ ◽

10.7717/peerj.11988 ◽

2021 ◽

Vol 9 ◽

pp. e11988

Author(s):

Kuan-Han Wu ◽

Fu-Jen Cheng ◽

Hsiang-Ling Tai ◽

Jui-Cheng Wang ◽

Yii-Ting Huang ◽

...

Keyword(s):

Machine Learning ◽

Emergency Department ◽

Hospital Mortality ◽

Early Warning ◽

Performance Test ◽

Model Development ◽

Training Data ◽

Early Warning Score ◽

Prediction Ability ◽

Specific Patient

Background A feasible and accurate risk prediction systems for emergency department (ED) patients is urgently required. The Modified Early Warning Score (MEWS) is a wide-used tool to predict clinical outcomes in ED. Literatures showed that machine learning (ML) had better predictability in specific patient population than traditional scoring system. By analyzing a large multicenter dataset, we aim to develop a ML model to predict in-hospital morality of the adult non traumatic ED patients for different time stages, and comparing performance with other ML models and MEWS. Methods A retrospective observational cohort study was conducted in five Taiwan EDs including two tertiary medical centers and three regional hospitals. All consecutively adult (>17 years old) non-traumatic patients admit to ED during a 9-year period (January first, 2008 to December 31th, 2016) were included. Exclusion criteria including patients with (1) out-of-hospital cardiac arrest and (2) discharge against medical advice and transferred to other hospital (3) missing collect variables. The primary outcome was in-hospital mortality and were categorized into 6, 24, 72, 168 hours mortality. MEWS was calculated by systolic blood pressure, pulse rate, respiratory rate, body temperature, and level of consciousness. An ensemble supervised stacking ML model was developed and compared to sensitive and unsensitive Xgboost, Random Forest, and Adaboost. We conducted a performance test and examine both the area under the receiver operating characteristic (AUROC) and the area under the precision and recall curve (AUPRC) as the comparative measures. Result After excluding 182,001 visits (7.46%), study group was consisted of 24,37,326 ED visits. The dataset was split into 67% training data and 33% test data for ML model development. There was no statistically difference found in the characteristics between two groups. For the prediction of 6, 24, 72, 168 hours in-hospital mortality, the AUROC of MEW and ML mode was 0.897, 0.865, 0.841, 0.816 and 0.939, 0.928, 0.913, 0.902 respectively. The stacking ML model outperform other ML model as well. For the prediction of in-hospital mortality over 48-hours, AUPRC performance of MEWS drop below 0.1, while the AUPRC of ML mode was 0.317 in 6 hours and 0.2150 in 168 hours. For each time frame, ML model achieved statistically significant higher AUROC and AUPRC than MEWS (all P < 0.001). Both models showed decreasing prediction ability as time elapse, but there was a trend that the gap of AUROC values between two model increases gradually (P < 0.001). Three MEWS thresholds (score >3, >4, and >5) were determined as baselines for comparison, ML mode consistently showed improved or equally performance in sensitivity, PPV, NPV, but not in specific. Conclusion Stacking ML methods improve predicted in-hospital mortality than MEWS in adult non-traumatic ED patients, especially in the prediction of delayed mortality.

Download Full-text

Prediction of Thermal Properties of Zeolites through Machine Learning

10.26434/chemrxiv-2021-m67lk-v2 ◽

2021 ◽

Author(s):

Maxime Ducamp ◽

François-Xavier Coudert

Keyword(s):

Machine Learning ◽

Thermal Properties ◽

Large Scale ◽

Materials Science ◽

Pore Space ◽

Harmonic Approximation ◽

Chemical Properties ◽

Training Data ◽

Gradient Boosting ◽

Geometric Descriptors

Download Full-text

Learning from the 2018 Western Japan Heavy Rains to Detect Floods during the 2019 Hagibis Typhoon

Remote Sensing ◽

10.3390/rs12142244 ◽

2020 ◽

Vol 12 (14) ◽

pp. 2244

Author(s):

Luis Moya ◽

Erick Mas ◽

Shunichi Koshimura

Keyword(s):

Machine Learning ◽

Real Time ◽

Local Governments ◽

Large Scale ◽

Damage Identification ◽

Remote Sensing Data ◽

Early Response ◽

Training Data ◽

Supervised Machine Learning ◽

A Current

Applications of machine learning on remote sensing data appear to be endless. Its use in damage identification for early response in the aftermath of a large-scale disaster has a specific issue. The collection of training data right after a disaster is costly, time-consuming, and many times impossible. This study analyzes a possible solution to the referred issue: the collection of training data from past disaster events to calibrate a discriminant function. Then the identification of affected areas in a current disaster can be performed in near real-time. The performance of a supervised machine learning classifier to learn from training data collected from the 2018 heavy rainfall at Okayama Prefecture, Japan, and to identify floods due to the typhoon Hagibis on 12 October 2019 at eastern Japan is reported in this paper. The results show a moderate agreement with flood maps provided by local governments and public institutions, and support the assumption that previous disaster information can be used to identify a current disaster in near-real time.

Download Full-text

Machine learning prediction of landslide deformation behaviour using acoustic emission and rainfall measurements

Engineering Geology ◽

10.1016/j.enggeo.2021.106315 ◽

2021 ◽

pp. 106315

Author(s):

Lizheng Deng ◽

Alister Smith ◽

Neil Dixon ◽

Hongyong Yuan

Keyword(s):

Machine Learning ◽

Acoustic Emission ◽

Deformation Behaviour ◽

Landslide Deformation

Download Full-text