Automated Machine-Learning Framework Integrating Histopathological and Radiological Information for Predicting IDH1 Mutation Status in Glioma

Diffuse gliomas are the most common malignant primary brain tumors. Identification of isocitrate dehydrogenase 1 (IDH1) mutations aids the diagnostic classification of these tumors and the prediction of their clinical outcomes. While histology continues to play a key role in frozen section diagnosis, as a diagnostic reference and as a method for monitoring disease progression, recent research has demonstrated the ability of multi-parametric magnetic resonance imaging (MRI) sequences for predicting IDH genotypes. In this paper, we aim to improve the prediction accuracy of IDH1 genotypes by integrating multi-modal imaging information from digitized histopathological data derived from routine histological slide scans and the MRI sequences including T1-contrast (T1) and Fluid-attenuated inversion recovery imaging (T2-FLAIR). In this research, we have established an automated framework to process, analyze and integrate the histopathological and radiological information from high-resolution pathology slides and multi-sequence MRI scans. Our machine-learning framework comprehensively computed multi-level information including molecular level, cellular level, and texture level information to reflect predictive IDH genotypes. Firstly, an automated pre-processing was developed to select the regions of interest (ROIs) from pathology slides. Secondly, to interactively fuse the multimodal complementary information, comprehensive feature information was extracted from the pathology ROIs and segmented tumor regions (enhanced tumor, edema and non-enhanced tumor) from MRI sequences. Thirdly, a Random Forest (RF)-based algorithm was employed to identify and quantitatively characterize histopathological and radiological imaging origins, respectively. Finally, we integrated multi-modal imaging features with a machine-learning algorithm and tested the performance of the framework for IDH1 genotyping, we also provided visual and statistical explanation to support the understanding on prediction outcomes. The training and testing experiments on 217 pathologically verified IDH1 genotyped glioma cases from multi-resource validated that our fully automated machine-learning model predicted IDH1 genotypes with greater accuracy and reliability than models that were based on radiological imaging data only. The accuracy of IDH1 genotype prediction was 0.90 compared to 0.82 for radiomic result. Thus, the integration of multi-parametric imaging features for automated analysis of cross-modal biomedical data improved the prediction accuracy of glioma IDH1 genotypes.

Download Full-text

PAIRS AutoGeo: an Automated Machine Learning Framework for Massive Geospatial Data

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378036 ◽

2020 ◽

Author(s):

Wang Zhou ◽

Levente J. Klein ◽

Siyuan Lu

Keyword(s):

Machine Learning ◽

Geospatial Data ◽

Learning Framework ◽

Automated Machine Learning

Download Full-text

Risky Driver Recognition with Class Imbalance Data and Automated Machine Learning Framework

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph18147534 ◽

2021 ◽

Vol 18 (14) ◽

pp. 7534

Author(s):

Ke Wang ◽

Qingwen Xue ◽

Jian John Lu

Keyword(s):

Machine Learning ◽

High Risk ◽

Loss Function ◽

Class Imbalance ◽

Support Vector ◽

Trajectory Data ◽

Recognition Model ◽

Learning Framework ◽

Sampling Cost ◽

Automated Machine Learning

Identifying high-risk drivers before an accident happens is necessary for traffic accident control and prevention. Due to the class-imbalance nature of driving data, high-risk samples as the minority class are usually ill-treated by standard classification algorithms. Instead of applying preset sampling or cost-sensitive learning, this paper proposes a novel automated machine learning framework that simultaneously and automatically searches for the optimal sampling, cost-sensitive loss function, and probability calibration to handle class-imbalance problem in recognition of risky drivers. The hyperparameters that control sampling ratio and class weight, along with other hyperparameters, are optimized by Bayesian optimization. To demonstrate the performance of the proposed automated learning framework, we establish a risky driver recognition model as a case study, using video-extracted vehicle trajectory data of 2427 private cars on a German highway. Based on rear-end collision risk evaluation, only 4.29% of all drivers are labeled as risky drivers. The inputs of the recognition model are the discrete Fourier transform coefficients of target vehicle’s longitudinal speed, lateral speed, and the gap between the target vehicle and its preceding vehicle. Among 12 sampling methods, 2 cost-sensitive loss functions, and 2 probability calibration methods, the result of automated machine learning is consistent with manual searching but much more computation-efficient. We find that the combination of Support Vector Machine-based Synthetic Minority Oversampling TEchnique (SVMSMOTE) sampling, cost-sensitive cross-entropy loss function, and isotonic regression can significantly improve the recognition ability and reduce the error of predicted probability.

Download Full-text

Automated clinical computational biology: an interpretable machine learning framework to predict disease severity and stratify patients from clinical data

10.31219/osf.io/9xc2j ◽

2018 ◽

Author(s):

soumya banerjee

Keyword(s):

Machine Learning ◽

Disease Severity ◽

Clinical Data ◽

Model Building ◽

Learning Experience ◽

Machine Learning Algorithms ◽

Close Collaboration ◽

Learning Framework ◽

Novel Biomarkers ◽

Automated Machine Learning

We outline an automated computational and machine learning framework that predicts disease severity andstratifies patients. We apply our framework to available clinical data. Our algorithm automatically generatesinsights and predicts disease severity with minimal operator intervention. The computational frameworkpresented here can be used to stratify patients, predict disease severity and propose novel biomarkers fordisease. Insights from machine learning algorithms coupled with clinical data may help guide therapy,personalize treatment and help clinicians understand the change in disease over time. Computationaltechniques like these can be used in translational medicine in close collaboration with clinicians and healthcareproviders. Our models are also interpretable, allowing clinicians with minimal machine learning experience toengage in model building. This work is a step towards automated machine learning in the clinic.

Download Full-text

A Machine Learning Framework for Edge Computing to Improve Prediction Accuracy in Mobile Health Monitoring

Computational Science and Its Applications – ICCSA 2019 - Lecture Notes in Computer Science ◽

10.1007/978-3-030-24302-9_30 ◽

2019 ◽

pp. 417-431

Author(s):

Sigdel Shree Ram ◽

Bernady Apduhan ◽

Norio Shiratori

Keyword(s):

Machine Learning ◽

Mobile Health ◽

Health Monitoring ◽

Prediction Accuracy ◽

Edge Computing ◽

Learning Framework ◽

Mobile Health Monitoring

Download Full-text

Su1816 - Agreement of CT Imaging Features of Crohn's Disease Between Radiologists and Automated Machine Learning Image Analysis

Gastroenterology ◽

10.1016/s0016-5085(18)32162-0 ◽

2018 ◽

Vol 154 (6) ◽

pp. S-595

Author(s):

Ryan W. Stidham ◽

Binu Enchakalody ◽

Akbar K. Waljee ◽

Peter D. Higgins ◽

Stewart Wang ◽

...

Keyword(s):

Machine Learning ◽

Image Analysis ◽

Crohn’S Disease ◽

Crohn's Disease ◽

Ct Imaging ◽

Imaging Features ◽

Automated Machine Learning

Download Full-text

Cardea: An Open Automated Machine Learning Framework for Electronic Health Records

2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA) ◽

10.1109/dsaa49011.2020.00068 ◽

2020 ◽

Author(s):

Sarah Alnegheimish ◽

Najat Alrashed ◽

Faisal Aleissa ◽

Shahad Althobaiti ◽

Dongyu Liu ◽

...

Keyword(s):

Machine Learning ◽

Electronic Health Records ◽

Health Records ◽

Learning Framework ◽

Automated Machine Learning ◽

Electronic Health

Download Full-text

Machine Learning Tools For off-Target Early Safety Assessment of Small Molecules In Drug Discovery (Single Task Neural Networks Vs Automated Machine Learning)

10.21203/rs.3.rs-957525/v1 ◽

2021 ◽

Author(s):

Doha Naga ◽

Wolfgang Muster ◽

Eunice Musvasva ◽

Gerhard F. Ecker

Keyword(s):

Machine Learning ◽

Drug Discovery ◽

Pharmaceutical Companies ◽

Learning Tools ◽

Safety Issues ◽

Learning Framework ◽

Automated Machine Learning ◽

And Performance ◽

Preclinical Safety

Abstract Unpredicted drug safety issues constitute the majority of failures in the pharmaceutical industry according to several studies[1-3]. Some of these preclinical safety issues could be attributed to the non-selective binding of compounds to targets other than their intended therapeutic target, causing undesired adverse events. Consequently, pharmaceutical companies including Roche, routinely run in-vitro safety screens to detect off-target activities prior to preclinical and clinical studies.Hereby we present a machine learning framework aiming at the prediction of our in-house 50 off-target panel[4] activities for ~ 4000 compounds, directly from their structure. This framework is intended to guide chemists in the drug design process prior to synthesis and accelerate drug discovery. It incorporates different ML approaches such as deep learning and automated machine learning. Outcomes from different methods are compared in terms of efficiency and efficacy. The most important challenges and factors impacting model construction and performance in addition to suggestions on how to overcome such challenges are also discussed.

Download Full-text

6mAPred-MSFF: A Deep Learning Model for Predicting DNA N6-Methyladenine Sites across Species Based on a Multi-Scale Feature Fusion Mechanism

Applied Sciences ◽

10.3390/app11167731 ◽

2021 ◽

Vol 11 (16) ◽

pp. 7731

Author(s):

Rao Zeng ◽

Minghong Liao

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Prediction Accuracy ◽

Cross Validation ◽

Feature Fusion ◽

Experimental Comparison ◽

Scale Feature ◽

Learning Framework ◽

Multi Scale ◽

Fold Cross Validation

DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: Arabidopsis thaliana, Fragaria vesca, Rosa chinensis, Homo sapiens, and Drosophila melanogaster with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors.

Download Full-text