DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data

AbstractBackgroundPrognosis (survival) prediction of patients is important for disease management. Multi-omics data are good resources for survival prediction, however, difficult to integrate computationally.ResultsWe introduce DeepProg, a new computational framework that robustly predicts patient survival subtypes based on multiple types of omic data. It employs an ensemble of deep-learning and machine-learning approaches to achieve high performance. We apply DeepProg on 32 cancer datasets from TCGA and discover that most cancers have two optimal survival subtypes. Patient survival risk-stratification using DeepProg is significantly better than another multi-omics data integration method called Similarity Network Fusion (p-value=7.9e-7). DeepProg shows excellent predictive accuracy in external validation cohorts, exemplified by 2 liver cancer (C-index 0.73 and 0.80) and five breast cancer datasets (C-index 0.68-0.73). Further comprehensive pan-cancer analysis unveils the genomic signatures common among all the poorest survival subtypes, with genes enriched in extracellular matrix modeling, immune deregulation, and mitosis processes.ConclusionsDeepProg is a powerful and generic computational framework to predict patient survival risks. DeepProg is freely available for non-commercial use at: http://garmiregroup.org/DeepProg

Download Full-text

DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data

Genome Medicine ◽

10.1186/s13073-021-00930-x ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Olivier B. Poirion ◽

Zheng Jing ◽

Kumardeep Chaudhary ◽

Sijia Huang ◽

Lana X. Garmire

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Survival Prediction ◽

Omics Data ◽

Learning Approaches ◽

Poor Survival ◽

Genomic Signatures ◽

Omics Integration ◽

Matrix Modeling ◽

Pan Cancer

AbstractMulti-omics data are good resources for prognosis and survival prediction; however, these are difficult to integrate computationally. We introduce DeepProg, a novel ensemble framework of deep-learning and machine-learning approaches that robustly predicts patient survival subtypes using multi-omics data. It identifies two optimal survival subtypes in most cancers and yields significantly better risk-stratification than other multi-omics integration methods. DeepProg is highly predictive, exemplified by two liver cancer (C-index 0.73–0.80) and five breast cancer datasets (C-index 0.68–0.73). Pan-cancer analysis associates common genomic signatures in poor survival subtypes with extracellular matrix modeling, immune deregulation, and mitosis processes. DeepProg is freely available at https://github.com/lanagarmire/DeepProg

Download Full-text

OmiEmbed: A Unified Multi-Task Deep Learning Framework for Multi-Omics Data

Cancers ◽

10.3390/cancers13123047 ◽

2021 ◽

Vol 13 (12) ◽

pp. 3047

Author(s):

Xiaoyu Zhang ◽

Yuting Xing ◽

Kai Sun ◽

Yike Guo

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Clinical Decision Making ◽

Personalised Medicine ◽

High Dimensional ◽

Survival Prediction ◽

Omics Data ◽

Learning Framework ◽

Deep Embedding ◽

Task Strategy

High-dimensional omics data contain intrinsic biomedical information that is crucial for personalised medicine. Nevertheless, it is challenging to capture them from the genome-wide data, due to the large number of molecular features and small number of available samples, which is also called “the curse of dimensionality” in machine learning. To tackle this problem and pave the way for machine learning-aided precision medicine, we proposed a unified multi-task deep learning framework named OmiEmbed to capture biomedical information from high-dimensional omics data with the deep embedding and downstream task modules. The deep embedding module learnt an omics embedding that mapped multiple omics data types into a latent space with lower dimensionality. Based on the new representation of multi-omics data, different downstream task modules were trained simultaneously and efficiently with the multi-task strategy to predict the comprehensive phenotype profile of each sample. OmiEmbed supports multiple tasks for omics data including dimensionality reduction, tumour type classification, multi-omics integration, demographic and clinical feature reconstruction, and survival prediction. The framework outperformed other methods on all three types of downstream tasks and achieved better performance with the multi-task strategy compared to training them individually. OmiEmbed is a powerful and unified framework that can be widely adapted to various applications of high-dimensional omics data and has great potential to facilitate more accurate and personalised clinical decision making.

Download Full-text

Deep Learning of Histopathology Images at the Single Cell Level

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.754641 ◽

2021 ◽

Vol 4 ◽

Author(s):

Kyubum Lee ◽

John H. Lockhart ◽

Mengyu Xie ◽

Ritu Chaudhary ◽

Robbert J. C. Slebos ◽

...

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Spatial Organization ◽

Region Of Interest ◽

Cell Types ◽

Training Data ◽

Omics Data ◽

Learning Approaches ◽

Cell Level ◽

Histopathology Images

The tumor immune microenvironment (TIME) encompasses many heterogeneous cell types that engage in extensive crosstalk among the cancer, immune, and stromal components. The spatial organization of these different cell types in TIME could be used as biomarkers for predicting drug responses, prognosis and metastasis. Recently, deep learning approaches have been widely used for digital histopathology images for cancer diagnoses and prognoses. Furthermore, some recent approaches have attempted to integrate spatial and molecular omics data to better characterize the TIME. In this review we focus on machine learning-based digital histopathology image analysis methods for characterizing tumor ecosystem. In this review, we will consider three different scales of histopathological analyses that machine learning can operate within: whole slide image (WSI)-level, region of interest (ROI)-level, and cell-level. We will systematically review the various machine learning methods in these three scales with a focus on cell-level analysis. We will provide a perspective of workflow on generating cell-level training data sets using immunohistochemistry markers to “weakly-label” the cell types. We will describe some common steps in the workflow of preparing the data, as well as some limitations of this approach. Finally, we will discuss future opportunities of integrating molecular omics data with digital histopathology images for characterizing tumor ecosystem.

Download Full-text

Automated Classification of Significant Prostate Cancer on MRI: A Systematic Review on the Performance of Machine Learning Applications

Cancers ◽

10.3390/cancers12061606 ◽

2020 ◽

Vol 12 (6) ◽

pp. 1606

Author(s):

Jose M. Castillo T. ◽

Muhammad Arif ◽

Wiro J. Niessen ◽

Ivo G. Schoots ◽

Jifke F. Veenland

Keyword(s):

Machine Learning ◽

Decision Making ◽

Deep Learning ◽

Clinical Decision Making ◽

External Validation ◽

Clinical Decision ◽

Interquartile Range ◽

Added Value ◽

Receiver Operating Curve ◽

Learning Approaches

Significant prostate carcinoma (sPCa) classification based on MRI using radiomics or deep learning approaches has gained much interest, due to the potential application in assisting in clinical decision-making. Objective: To systematically review the literature (i) to determine which algorithms are most frequently used for sPCa classification, (ii) to investigate whether there exists a relation between the performance and the method or the MRI sequences used, (iii) to assess what study design factors affect the performance on sPCa classification, and (iv) to research whether performance had been evaluated in a clinical setting Methods: The databases Embase and Ovid MEDLINE were searched for studies describing machine learning or deep learning classification methods discriminating between significant and nonsignificant PCa on multiparametric MRI that performed a valid validation procedure. Quality was assessed by the modified radiomics quality score. We computed the median area under the receiver operating curve (AUC) from overall methods and the interquartile range. Results: From 2846 potentially relevant publications, 27 were included. The most frequent algorithms used in the literature for PCa classification are logistic regression (22%) and convolutional neural networks (CNNs) (22%). The median AUC was 0.79 (interquartile range: 0.77–0.87). No significant effect of number of included patients, image sequences, or reference standard on the reported performance was found. Three studies described an external validation and none of the papers described a validation in a prospective clinical trial. Conclusions: To unlock the promising potential of machine and deep learning approaches, validation studies and clinical prospective studies should be performed with an established protocol to assess the added value in decision-making.

Download Full-text

A Review of Recent Deep Learning Approaches in Human-Centered Machine Learning

Sensors ◽

10.3390/s21072514 ◽

2021 ◽

Vol 21 (7) ◽

pp. 2514

Author(s):

Tharindu Kaluarachchi ◽

Andrew Reis ◽

Suranga Nanayakkara

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Review Paper ◽

Learning Systems ◽

Learning Approaches ◽

Application Development ◽

Research Gaps ◽

Domain Experts ◽

Working Definition ◽

Real World Application

After Deep Learning (DL) regained popularity recently, the Artificial Intelligence (AI) or Machine Learning (ML) field is undergoing rapid growth concerning research and real-world application development. Deep Learning has generated complexities in algorithms, and researchers and users have raised concerns regarding the usability and adoptability of Deep Learning systems. These concerns, coupled with the increasing human-AI interactions, have created the emerging field that is Human-Centered Machine Learning (HCML). We present this review paper as an overview and analysis of existing work in HCML related to DL. Firstly, we collaborated with field domain experts to develop a working definition for HCML. Secondly, through a systematic literature review, we analyze and classify 162 publications that fall within HCML. Our classification is based on aspects including contribution type, application area, and focused human categories. Finally, we analyze the topology of the HCML landscape by identifying research gaps, highlighting conflicting interpretations, addressing current challenges, and presenting future HCML research opportunities.

Download Full-text

Detection of Malicious Software by Analyzing Distinct Artifacts Using Machine Learning and Deep Learning Algorithms

Electronics ◽

10.3390/electronics10141694 ◽

2021 ◽

Vol 10 (14) ◽

pp. 1694

Author(s):

Mathew Ashik ◽

A. Jyothish ◽

S. Anandaram ◽

P. Vinod ◽

Francesco Mercaldo ◽

...

Keyword(s):

Neural Network ◽

Machine Learning ◽

Deep Learning ◽

Support Vector ◽

Malware Analysis ◽

Learning Approaches ◽

Dynamic Features ◽

System Calls ◽

Prevention Methods ◽

Structural Aspects

Malware is one of the most significant threats in today’s computing world since the number of websites distributing malware is increasing at a rapid rate. Malware analysis and prevention methods are increasingly becoming necessary for computer systems connected to the Internet. This software exploits the system’s vulnerabilities to steal valuable information without the user’s knowledge, and stealthily send it to remote servers controlled by attackers. Traditionally, anti-malware products use signatures for detecting known malware. However, the signature-based method does not scale in detecting obfuscated and packed malware. Considering that the cause of a problem is often best understood by studying the structural aspects of a program like the mnemonics, instruction opcode, API Call, etc. In this paper, we investigate the relevance of the features of unpacked malicious and benign executables like mnemonics, instruction opcodes, and API to identify a feature that classifies the executable. Prominent features are extracted using Minimum Redundancy and Maximum Relevance (mRMR) and Analysis of Variance (ANOVA). Experiments were conducted on four datasets using machine learning and deep learning approaches such as Support Vector Machine (SVM), Naïve Bayes, J48, Random Forest (RF), and XGBoost. In addition, we also evaluate the performance of the collection of deep neural networks like Deep Dense network, One-Dimensional Convolutional Neural Network (1D-CNN), and CNN-LSTM in classifying unknown samples, and we observed promising results using APIs and system calls. On combining APIs/system calls with static features, a marginal performance improvement was attained comparing models trained only on dynamic features. Moreover, to improve accuracy, we implemented our solution using distinct deep learning methods and demonstrated a fine-tuned deep neural network that resulted in an F1-score of 99.1% and 98.48% on Dataset-2 and Dataset-3, respectively.

Download Full-text

Analysis of the Nosema Cells Identification for Microscopic Images

Sensors ◽

10.3390/s21093068 ◽

2021 ◽

Vol 21 (9) ◽

pp. 3068

Author(s):

Soumaya Dghim ◽

Carlos M. Travieso-González ◽

Radim Burget

Keyword(s):

Neural Network ◽

Machine Learning ◽

Image Processing ◽

Deep Learning ◽

The Other ◽

Support Vector ◽

Learning Approaches ◽

Microscopic Images ◽

Trained Neural Network ◽

Nosema Disease

The use of image processing tools, machine learning, and deep learning approaches has become very useful and robust in recent years. This paper introduces the detection of the Nosema disease, which is considered to be one of the most economically significant diseases today. This work shows a solution for recognizing and identifying Nosema cells between the other existing objects in the microscopic image. Two main strategies are examined. The first strategy uses image processing tools to extract the most valuable information and features from the dataset of microscopic images. Then, machine learning methods are applied, such as a neural network (ANN) and support vector machine (SVM) for detecting and classifying the Nosema disease cells. The second strategy explores deep learning and transfers learning. Several approaches were examined, including a convolutional neural network (CNN) classifier and several methods of transfer learning (AlexNet, VGG-16 and VGG-19), which were fine-tuned and applied to the object sub-images in order to identify the Nosema images from the other object images. The best accuracy was reached by the VGG-16 pre-trained neural network with 96.25%.

Download Full-text

A Review of Computer-Aided Expert Systems for Breast Cancer Diagnosis

Cancers ◽

10.3390/cancers13112764 ◽

2021 ◽

Vol 13 (11) ◽

pp. 2764

Author(s):

Xin Yu Liew ◽

Nazia Hameed ◽

Jeremie Clos

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Deep Learning ◽

Main Process ◽

Learning Approaches ◽

Learning Methods ◽

Advantages And Disadvantages ◽

Computer Aided ◽

Conventional Methods ◽

The Impact

A computer-aided diagnosis (CAD) expert system is a powerful tool to efficiently assist a pathologist in achieving an early diagnosis of breast cancer. This process identifies the presence of cancer in breast tissue samples and the distinct type of cancer stages. In a standard CAD system, the main process involves image pre-processing, segmentation, feature extraction, feature selection, classification, and performance evaluation. In this review paper, we reviewed the existing state-of-the-art machine learning approaches applied at each stage involving conventional methods and deep learning methods, the comparisons within methods, and we provide technical details with advantages and disadvantages. The aims are to investigate the impact of CAD systems using histopathology images, investigate deep learning methods that outperform conventional methods, and provide a summary for future researchers to analyse and improve the existing techniques used. Lastly, we will discuss the research gaps of existing machine learning approaches for implementation and propose future direction guidelines for upcoming researchers.

Download Full-text

Using machine learning approaches for multi-omics data analysis: A review

Biotechnology Advances ◽

10.1016/j.biotechadv.2021.107739 ◽

2021 ◽

Vol 49 ◽

pp. 107739

Author(s):

Parminder S. Reel ◽

Smarti Reel ◽

Ewan Pearson ◽

Emanuele Trucco ◽

Emily Jefferson

Keyword(s):

Machine Learning ◽

Data Analysis ◽

Omics Data ◽

Learning Approaches ◽

Omics Data Analysis

Download Full-text

SURG-02. SURVIVAL PREDICTION AFTER NEUROSURGICAL RESECTION OF BRAIN METASTASES: A MACHINE LEARNING APPROACH

Neuro-Oncology ◽

10.1093/neuonc/noaa215.849 ◽

2020 ◽

Vol 22 (Supplement_2) ◽

pp. ii203-ii203

Author(s):

Alexander Hulsbergen ◽

Yu Tung Lo ◽

Vasileios Kavouridis ◽

John Phillips ◽

Timothy Smith ◽

...

Keyword(s):

Machine Learning ◽

Brain Metastases ◽

External Validation ◽

Superior Performance ◽

Prognostic Models ◽

Receiver Operating Curve ◽

Gradient Boosting ◽

Survival Prediction ◽

Ensemble Model ◽

Adaptive Boosting

Abstract INTRODUCTION Survival prediction in brain metastases (BMs) remains challenging. Current prognostic models have been created and validated almost completely with data from patients receiving radiotherapy only, leaving uncertainty about surgical patients. Therefore, the aim of this study was to build and validate a model predicting 6-month survival after BM resection using different machine learning (ML) algorithms. METHODS An institutional database of 1062 patients who underwent resection for BM was split into a 80:20 training and testing set. Seven different ML algorithms were trained and assessed for performance. Moreover, an ensemble model was created incorporating random forest, adaptive boosting, gradient boosting, and logistic regression algorithms. Five-fold cross validation was used for hyperparameter tuning. Model performance was assessed using area under the receiver-operating curve (AUC) and calibration and was compared against the diagnosis-specific graded prognostic assessment (ds-GPA); the most established prognostic model in BMs. RESULTS The ensemble model showed superior performance with an AUC of 0.81 in the hold-out test set, a calibration slope of 1.14, and a calibration intercept of -0.08, outperforming the ds-GPA (AUC 0.68). Patients were stratified into high-, medium- and low-risk groups for death at 6 months; these strata strongly predicted both 6-months and longitudinal overall survival (p < 0.001). CONCLUSIONS We developed and internally validated an ensemble ML model that accurately predicts 6-month survival after neurosurgical resection for BM, outperforms the most established model in the literature, and allows for meaningful risk stratification. Future efforts should focus on external validation of our model.

Download Full-text