Automated clinical computational biology: an interpretable machine learning framework to predict disease severity and stratify patients from clinical data

Mapping Intimacies ◽

10.31219/osf.io/9xc2j ◽

2018 ◽

Author(s):

soumya banerjee

Keyword(s):

Machine Learning ◽

Disease Severity ◽

Clinical Data ◽

Model Building ◽

Learning Experience ◽

Machine Learning Algorithms ◽

Close Collaboration ◽

Learning Framework ◽

Novel Biomarkers ◽

Automated Machine Learning

We outline an automated computational and machine learning framework that predicts disease severity andstratifies patients. We apply our framework to available clinical data. Our algorithm automatically generatesinsights and predicts disease severity with minimal operator intervention. The computational frameworkpresented here can be used to stratify patients, predict disease severity and propose novel biomarkers fordisease. Insights from machine learning algorithms coupled with clinical data may help guide therapy,personalize treatment and help clinicians understand the change in disease over time. Computationaltechniques like these can be used in translational medicine in close collaboration with clinicians and healthcareproviders. Our models are also interpretable, allowing clinicians with minimal machine learning experience toengage in model building. This work is a step towards automated machine learning in the clinic.

Download Full-text

Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development

JMIR Medical Informatics ◽

10.2196/25884 ◽

2021 ◽

Vol 9 (4) ◽

pp. e25884

Author(s):

Sakifa Aktar ◽

Md Martuza Ahamad ◽

Md Rashed-Al-Mahfuz ◽

AKM Azad ◽

Shahadat Uddin ◽

...

Keyword(s):

Machine Learning ◽

Disease Severity ◽

Clinical Data ◽

Peripheral Blood ◽

Machine Learning Algorithms ◽

Accurate Prediction ◽

Gradient Boosting ◽

Support Vector ◽

Blood Samples ◽

Severity Prediction

Background Accurate prediction of the disease severity of patients with COVID-19 would greatly improve care delivery and resource allocation and thereby reduce mortality risks, especially in less developed countries. Many patient-related factors, such as pre-existing comorbidities, affect disease severity and can be used to aid this prediction. Objective Because rapid automated profiling of peripheral blood samples is widely available, we aimed to investigate how data from the peripheral blood of patients with COVID-19 can be used to predict clinical outcomes. Methods We investigated clinical data sets of patients with COVID-19 with known outcomes by combining statistical comparison and correlation methods with machine learning algorithms; the latter included decision tree, random forest, variants of gradient boosting machine, support vector machine, k-nearest neighbor, and deep learning methods. Results Our work revealed that several clinical parameters that are measurable in blood samples are factors that can discriminate between healthy people and COVID-19–positive patients, and we showed the value of these parameters in predicting later severity of COVID-19 symptoms. We developed a number of analytical methods that showed accuracy and precision scores >90% for disease severity prediction. Conclusions We developed methodologies to analyze routine patient clinical data that enable more accurate prediction of COVID-19 patient outcomes. With this approach, data from standard hospital laboratory analyses of patient blood could be used to identify patients with COVID-19 who are at high risk of mortality, thus enabling optimization of hospital facilities for COVID-19 treatment.

Download Full-text

Earth Model Building in Real-Time with an Automated Machine Learning Framework - A Midland Basin Example

Proceedings of the 9th Unconventional Resources Technology Conference ◽

10.15530/urtec-2021-5659 ◽

2021 ◽

Author(s):

Altay Sansal ◽

Muhlis Unaldi ◽

Edward Tian ◽

Gareth Taylor

Keyword(s):

Machine Learning ◽

Real Time ◽

Model Building ◽

Earth Model ◽

Midland Basin ◽

Learning Framework ◽

Automated Machine Learning

Download Full-text

Machine Learning Approach to Predicting COVID-19 Disease Severity Based on Clinical Blood Test Data: Statistical Analysis and Model Development (Preprint)

10.2196/preprints.25884 ◽

2020 ◽

Author(s):

Sakifa Aktar ◽

Md Martuza Ahamad ◽

Md Rashed-Al-Mahfuz ◽

AKM Azad ◽

Shahadat Uddin ◽

...

Keyword(s):

Machine Learning ◽

Disease Severity ◽

Clinical Data ◽

Peripheral Blood ◽

Machine Learning Algorithms ◽

Accurate Prediction ◽

Gradient Boosting ◽

Support Vector ◽

Blood Samples ◽

Severity Prediction

BACKGROUND Accurate prediction of the disease severity of patients with COVID-19 would greatly improve care delivery and resource allocation and thereby reduce mortality risks, especially in less developed countries. Many patient-related factors, such as pre-existing comorbidities, affect disease severity and can be used to aid this prediction. OBJECTIVE Because rapid automated profiling of peripheral blood samples is widely available, we aimed to investigate how data from the peripheral blood of patients with COVID-19 can be used to predict clinical outcomes. METHODS We investigated clinical data sets of patients with COVID-19 with known outcomes by combining statistical comparison and correlation methods with machine learning algorithms; the latter included decision tree, random forest, variants of gradient boosting machine, support vector machine, k-nearest neighbor, and deep learning methods. RESULTS Our work revealed that several clinical parameters that are measurable in blood samples are factors that can discriminate between healthy people and COVID-19–positive patients, and we showed the value of these parameters in predicting later severity of COVID-19 symptoms. We developed a number of analytical methods that showed accuracy and precision scores >90% for disease severity prediction. CONCLUSIONS We developed methodologies to analyze routine patient clinical data that enable more accurate prediction of COVID-19 patient outcomes. With this approach, data from standard hospital laboratory analyses of patient blood could be used to identify patients with COVID-19 who are at high risk of mortality, thus enabling optimization of hospital facilities for COVID-19 treatment.

Download Full-text

PAIRS AutoGeo: an Automated Machine Learning Framework for Massive Geospatial Data

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378036 ◽

2020 ◽

Author(s):

Wang Zhou ◽

Levente J. Klein ◽

Siyuan Lu

Keyword(s):

Machine Learning ◽

Geospatial Data ◽

Learning Framework ◽

Automated Machine Learning

Download Full-text

Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18147534 ◽

2021 ◽

Vol 18 (14) ◽

pp. 7534

Author(s):

Ke Wang ◽

Qingwen Xue ◽

Jian John Lu

Keyword(s):

Machine Learning ◽

High Risk ◽

Loss Function ◽

Class Imbalance ◽

Support Vector ◽

Trajectory Data ◽

Recognition Model ◽

Learning Framework ◽

Sampling Cost ◽

Automated Machine Learning

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.

Download Full-text

Elimination of Irrelevant Features and Heart Disease Recognition by Employing Machine Learning Algorithms using Clinical Data

2020 17th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP) ◽

10.1109/iccwamtip51612.2020.9317351 ◽

2020 ◽

Author(s):

Huaiyu Wen ◽

Sufang Li ◽

Amin Ul Haq ◽

Jian Ping Li ◽

Rajesh Kumar ◽

...

Keyword(s):

Machine Learning ◽

Heart Disease ◽

Clinical Data ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Deep Learning Approaches for Sentiment Analysis Challenges and Future Issues

10.4018/978-1-7998-8161-2.ch003 ◽

2022 ◽

pp. 27-50

Author(s):

Rajalaxmi Prabhu B. ◽

Seema S.

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Model Building ◽

Large Data ◽

Machine Learning Algorithms ◽

Large Data Sets ◽

Data Sets ◽

Learning Approaches ◽

Learning Techniques ◽

Important Challenge

A lot of user-generated data is available these days from huge platforms, blogs, websites, and other review sites. These data are usually unstructured. Analyzing sentiments from these data automatically is considered an important challenge. Several machine learning algorithms are implemented to check the opinions from large data sets. A lot of research has been undergone in understanding machine learning approaches to analyze sentiments. Machine learning mainly depends on the data required for model building, and hence, suitable feature exactions techniques also need to be carried. In this chapter, several deep learning approaches, its challenges, and future issues will be addressed. Deep learning techniques are considered important in predicting the sentiments of users. This chapter aims to analyze the deep-learning techniques for predicting sentiments and understanding the importance of several approaches for mining opinions and determining sentiment polarity.

Download Full-text

mAML: an automated machine learning pipeline with a microbiome repository for human disease classification

10.1101/2020.02.11.943316 ◽

2020 ◽

Author(s):

Fenglong Yang ◽

Quan Zou

Keyword(s):

Machine Learning ◽

Human Disease ◽

High Performance ◽

Model Building ◽

Disease Classification ◽

Benchmark Datasets ◽

Automated Machine Learning ◽

Classification Tasks ◽

Interpretable Models ◽

Multi Class Classification

AbstractDue to the concerted efforts to utilize the microbial features to improve disease prediction capabilities, automated machine learning (AutoML) systems designed to get rid of the tediousness in manually performing ML tasks are in great demand. Here we developed mAML, an ML model-building pipeline, which can automatically and rapidly generate optimized and interpretable models for personalized microbial classification tasks in a reproducible way. The pipeline is deployed on a web-based platform and the server is user-friendly, flexible, and has been designed to be scalable according to the specific requirements. This pipeline exhibits high performance for 13 benchmark datasets including both binary and multi-class classification tasks. In addition, to facilitate the application of mAML and expand the human disease-related microbiome learning repository, we developed GMrepo ML repository (GMrepo Microbiome Learning repository) from the GMrepo database. The repository involves 120 microbial classification tasks for 85 human-disease phenotypes referring to 12,429 metagenomic samples and 38,643 amplicon samples. The mAML pipeline and the GMrepo ML repository are expected to be important resources for researches in microbiology and algorithm developments.Database URLhttp://39.100.246.211:8050/Home

Download Full-text

Predicting Alert Source Device using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.d1526.079920 ◽

2020 ◽

Vol 9 (9) ◽

pp. 1-10

Keyword(s):

Neural Network ◽

Machine Learning ◽

Model Building ◽

Learning Algorithm ◽

Learning Algorithms ◽

Research Work ◽

Machine Learning Algorithms ◽

Training Dataset ◽

Imbalanced Dataset ◽

Daunting Task

In a large distributed virtualized environment, predicting the alerting source from its text seems to be daunting task. This paper explores the option of using machine learning algorithm to solve this problem. Unfortunately, our training dataset is highly imbalanced. Where 96% of alerting data is reported by 24% of alerting sources. This is the expected dataset in any live distributed virtualized environment, where new version of device will have relatively less alert compared to older devices. Any classification effort with such imbalanced dataset present different set of challenges compared to binary classification. This type of skewed data distribution makes conventional machine learning less effective, especially while predicting the minority device type alerts. Our challenge is to build a robust model which can cope with this imbalanced dataset and achieves relative high level of prediction accuracy. This research work stared with traditional regression and classification algorithms using bag of words model. Then word2vec and doc2vec models are used to represent the words in vector formats, which preserve the sematic meaning of the sentence. With this alerting text with similar message will have same vector form representation. This vectorized alerting text is used with Logistic Regression for model building. This yields better accuracy, but the model is relatively complex and demand more computational resources. Finally, simple neural network is used for this multi-class text classification problem domain by using keras and tensorflow libraries. A simple two layered neural network yielded 99 % accuracy, even though our training dataset was not balanced. This paper goes through the qualitative evaluation of the different machine learning algorithms and their respective result. Finally, two layered deep learning algorithms is selected as final solution, since it takes relatively less resource and time with better accuracy values.

Download Full-text

Lignin Biorefinery Optimization Through Machine Learning

10.33774/chemrxiv-2021-6r888 ◽

2021 ◽

Author(s):

Joakim Löfgren ◽

Dmitry Tarasov ◽

Taru Koitto ◽

Patrick Rinke ◽

Mikhail Balakshin ◽

...

Keyword(s):

Machine Learning ◽

Predictive Models ◽

Model Building ◽

Paper Industry ◽

2D Nmr ◽

Hydrothermal Pretreatment ◽

Bayesian Optimization ◽

Learning Framework ◽

Point Analysis ◽

Data Points

Lignin is an abundant biomaterial that currently emerges as a low value by-product in the pulp and paper industry but could be repurposed for high-value products as part of the ongoing global transition to a sustainable society. To increase lignins value, rational and efficient approaches to optimizing lignin biorefineries to produce high value bioproducts are required. Here, we report the optimization of the AquaSolv Omni (AqSO) Biorefinery, a newly introduced biorefinery concept based on hydrothermal pretreatment and solvent extraction. We employ a machine-learning framework based on Bayesian optimization, to provide sample-efficient and guided data collection as well as surrogate model building. The surrogate models allow us to map multiple experimental outputs, including the extracted lignin yield and main structural properties obtained by 2D NMR, as functions of the hydrothermal pretreatment reaction severity and temperature. Our results show that with Bayesian optimization, predictive models can be converged with only 21 data points to within a margin of error comparable to the underlying experimental error. By applying a Pareto point analysis, we demonstrate how the predictive models can be used in tandem to identify optimal extraction conditions for concrete applications in lignin valorization.

Download Full-text