Earth Model Building in Real-Time with an Automated Machine Learning Framework - A Midland Basin Example

Author(s):  
Altay Sansal ◽  
Muhlis Unaldi ◽  
Edward Tian ◽  
Gareth Taylor
2018 ◽  
Author(s):  
soumya banerjee

We outline an automated computational and machine learning framework that predicts disease severity andstratifies patients. We apply our framework to available clinical data. Our algorithm automatically generatesinsights and predicts disease severity with minimal operator intervention. The computational frameworkpresented here can be used to stratify patients, predict disease severity and propose novel biomarkers fordisease. Insights from machine learning algorithms coupled with clinical data may help guide therapy,personalize treatment and help clinicians understand the change in disease over time. Computationaltechniques like these can be used in translational medicine in close collaboration with clinicians and healthcareproviders. Our models are also interpretable, allowing clinicians with minimal machine learning experience toengage in model building. This work is a step towards automated machine learning in the clinic.


Author(s):  
Ke Wang ◽  
Qingwen Xue ◽  
Jian John Lu

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.


2020 ◽  
Author(s):  
Fenglong Yang ◽  
Quan Zou

AbstractDue to the concerted efforts to utilize the microbial features to improve disease prediction capabilities, automated machine learning (AutoML) systems designed to get rid of the tediousness in manually performing ML tasks are in great demand. Here we developed mAML, an ML model-building pipeline, which can automatically and rapidly generate optimized and interpretable models for personalized microbial classification tasks in a reproducible way. The pipeline is deployed on a web-based platform and the server is user-friendly, flexible, and has been designed to be scalable according to the specific requirements. This pipeline exhibits high performance for 13 benchmark datasets including both binary and multi-class classification tasks. In addition, to facilitate the application of mAML and expand the human disease-related microbiome learning repository, we developed GMrepo ML repository (GMrepo Microbiome Learning repository) from the GMrepo database. The repository involves 120 microbial classification tasks for 85 human-disease phenotypes referring to 12,429 metagenomic samples and 38,643 amplicon samples. The mAML pipeline and the GMrepo ML repository are expected to be important resources for researches in microbiology and algorithm developments.Database URLhttp://39.100.246.211:8050/Home


2021 ◽  
Author(s):  
Joakim Löfgren ◽  
Dmitry Tarasov ◽  
Taru Koitto ◽  
Patrick Rinke ◽  
Mikhail Balakshin ◽  
...  

Lignin is an abundant biomaterial that currently emerges as a low value by-product in the pulp and paper industry but could be repurposed for high-value products as part of the ongoing global transition to a sustainable society. To increase lignins value, rational and efficient approaches to optimizing lignin biorefineries to produce high value bioproducts are required. Here, we report the optimization of the AquaSolv Omni (AqSO) Biorefinery, a newly introduced biorefinery concept based on hydrothermal pretreatment and solvent extraction. We employ a machine-learning framework based on Bayesian optimization, to provide sample-efficient and guided data collection as well as surrogate model building. The surrogate models allow us to map multiple experimental outputs, including the extracted lignin yield and main structural properties obtained by 2D NMR, as functions of the hydrothermal pretreatment reaction severity and temperature. Our results show that with Bayesian optimization, predictive models can be converged with only 21 data points to within a margin of error comparable to the underlying experimental error. By applying a Pareto point analysis, we demonstrate how the predictive models can be used in tandem to identify optimal extraction conditions for concrete applications in lignin valorization.


Sign in / Sign up

Export Citation Format

Share Document