scholarly journals Accurate cancer phenotype prediction with AKLIMATE, a stacked kernel learner integrating multimodal genomic data and pathway knowledge

2021 ◽  
Vol 17 (4) ◽  
pp. e1008878
Author(s):  
Vladislav Uzunangelov ◽  
Christopher K. Wong ◽  
Joshua M. Stuart

Advancements in sequencing have led to the proliferation of multi-omic profiles of human cells under different conditions and perturbations. In addition, many databases have amassed information about pathways and gene “signatures”—patterns of gene expression associated with specific cellular and phenotypic contexts. An important current challenge in systems biology is to leverage such knowledge about gene coordination to maximize the predictive power and generalization of models applied to high-throughput datasets. However, few such integrative approaches exist that also provide interpretable results quantifying the importance of individual genes and pathways to model accuracy. We introduce AKLIMATE, a first kernel-based stacked learner that seamlessly incorporates multi-omics feature data with prior information in the form of pathways for either regression or classification tasks. AKLIMATE uses a novel multiple-kernel learning framework where individual kernels capture the prediction propensities recorded in random forests, each built from a specific pathway gene set that integrates all omics data for its member genes. AKLIMATE has comparable or improved performance relative to state-of-the-art methods on diverse phenotype learning tasks, including predicting microsatellite instability in endometrial and colorectal cancer, survival in breast cancer, and cell line response to gene knockdowns. We show how AKLIMATE is able to connect feature data across data platforms through their common pathways to identify examples of several known and novel contributors of cancer and synthetic lethality.

2020 ◽  
Author(s):  
Vladislav Uzunangelov ◽  
Christopher K. Wong ◽  
Joshua M. Stuart

Advancements in sequencing have led to the proliferation of multi-omic profiles of human cells under different conditions and perturbations. In addition, several databases have amassed information about pathways and gene “signatures” – patterns of gene expression associated with specific cellular and phenotypic contexts. An important current challenge in systems biology is to leverage such knowledge about gene coordination to maximize the predictive power and generalization of models applied to high-throughput datasets. However, few such integrative approaches exist that also provide interpretable results quantifying the importance of individual genes and pathways to model accuracy. We introduce AKLI-MATE, a first kernel-based stacked learner that seamlessly incorporates multi-omics feature data with prior information in the form of pathways for either regression or classification tasks. AKLIMATE uses a novel multiple-kernel learning framework where individual kernels capture the prediction propensities recorded in random forests, each built from a specific pathway gene set that integrates all omics data for its member genes. AKLIMATE outperforms state-of-the-art methods on diverse phenotype learning tasks, including predicting microsatellite instability in endometrial and colorectal cancer, survival in breast cancer, and cell line response to gene knockdowns. We show how AKLIMATE is able to connect feature data across data platforms through their common pathways to identify examples of several known and novel contributors of cancer and synthetic lethality.


2021 ◽  
Vol 6 (1) ◽  
Author(s):  
Peter Morales ◽  
Rajmonda Sulo Caceres ◽  
Tina Eliassi-Rad

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.


2017 ◽  
Vol 162 (1) ◽  
pp. 191-198 ◽  
Author(s):  
Rajesh Ramanathan ◽  
Amy L. Olex ◽  
Mikhail Dozmorov ◽  
Harry D. Bear ◽  
Leopoldo Jose Fernandez ◽  
...  

2020 ◽  
Vol 12 (12) ◽  
pp. 1964 ◽  
Author(s):  
Mengbin Rao ◽  
Ping Tang ◽  
Zheng Zhang

Since hyperspectral images (HSI) captured by different sensors often contain different number of bands, but most of the convolutional neural networks (CNN) require a fixed-size input, the generalization capability of deep CNNs to use heterogeneous input to achieve better classification performance has become a research focus. For classification tasks with limited labeled samples, the training strategy of feeding CNNs with sample-pairs instead of single sample has proven to be an efficient approach. Following this strategy, we propose a Siamese CNN with three-dimensional (3D) adaptive spatial-spectral pyramid pooling (ASSP) layer, called ASSP-SCNN, that takes as input 3D sample-pair with varying size and can easily be transferred to another HSI dataset regardless of the number of spectral bands. The 3D ASSP layer can also extract different levels of 3D information to improve the classification performance of the equipped CNN. To evaluate the classification and generalization performance of ASSP-SCNN, our experiments consist of two parts: the experiments of ASSP-SCNN without pre-training and the experiments of ASSP-SCNN-based transfer learning framework. Experimental results on three HSI datasets demonstrate that both ASSP-SCNN without pre-training and transfer learning based on ASSP-SCNN achieve higher classification accuracies than several state-of-the-art CNN-based methods. Moreover, we also compare the performance of ASSP-SCNN on different transfer learning tasks, which further verifies that ASSP-SCNN has a strong generalization capability.


2020 ◽  
Author(s):  
Yeping Lina Qiu ◽  
Hong Zheng ◽  
Arnout Devos ◽  
Olivier Gevaert

AbstractRNA sequencing has emerged as a promising approach in cancer prognosis as sequencing data becomes more easily and affordably accessible. However, it remains challenging to build good predictive models especially when the sample size is limited and the number of features is high, which is a common situation in biomedical settings. To address these limitations, we propose a meta-learning framework based on neural networks for survival analysis and evaluate it in a genomic cancer research setting. We demonstrate that, compared to regular transfer-learning, meta-learning is a significantly more effective paradigm to leverage high-dimensional data that is relevant but not directly related to the problem of interest. Specifically, meta-learning explicitly constructs a model, from abundant data of relevant tasks, to learn a new task with few samples effectively. For the application of predicting cancer survival outcome, we also show that the meta-learning framework with a few samples is able to achieve competitive performance with learning from scratch with a significantly larger number of samples. Finally, we demonstrate that the meta-learning model implicitly prioritizes genes based on their contribution to survival prediction and allows us to identify important pathways in cancer.


2016 ◽  
pp. 1245-1292 ◽  
Author(s):  
Muhammad Ibrahim ◽  
Manzur Murshed

Ranking a set of documents based on their relevances with respect to a given query is a central problem of information retrieval (IR). Traditionally people have been using unsupervised scoring methods like tf-idf, BM25, Language Model etc., but recently supervised machine learning framework is being used successfully to learn a ranking function, which is called learning-to-rank (LtR) problem. There are a few surveys on LtR in the literature; but these reviews provide very little assistance to someone who, before delving into technical details of different algorithms, wants to have a broad understanding of LtR systems and its evolution from and relation to the traditional IR methods. This chapter tries to address this gap in the literature. Mainly the following aspects are discussed: the fundamental concepts of IR, the motivation behind LtR, the evolution of LtR from and its relation to the traditional methods, the relationship between LtR and other supervised machine learning tasks, the general issues pertaining to an LtR algorithm, and the theory of LtR.


2015 ◽  
Vol 2015 ◽  
pp. 1-9
Author(s):  
Jian-Sheng Wu ◽  
Hai-Feng Hu ◽  
Shan-Cheng Yan ◽  
Li-Hua Tang

Nature often brings several domains together to form multidomain and multifunctional proteins with a vast number of possibilities. In our previous study, we disclosed that the protein function prediction problem is naturally and inherently Multi-Instance Multilabel (MIML) learning tasks. Automated protein function prediction is typically implemented under the assumption that the functions of labeled proteins are complete; that is, there are no missing labels. In contrast, in practice just a subset of the functions of a protein are known, and whether this protein has other functions is unknown. It is evident that protein function prediction tasks suffer fromweak-labelproblem; thus protein function prediction with incomplete annotation matches well with the MIML with weak-label learning framework. In this paper, we have applied the state-of-the-art MIML with weak-label learning algorithm MIMLwel for predicting protein functions in two typical real-world electricigens organisms which have been widely used in microbial fuel cells (MFCs) researches. Our experimental results validate the effectiveness of MIMLwel algorithm in predicting protein functions with incomplete annotation.


2019 ◽  
Vol 35 (24) ◽  
pp. 5137-5145 ◽  
Author(s):  
Onur Dereli ◽  
Ceyda Oğuz ◽  
Mehmet Gönen

Abstract Motivation Survival analysis methods that integrate pathways/gene sets into their learning model could identify molecular mechanisms that determine survival characteristics of patients. Rather than first picking the predictive pathways/gene sets from a given collection and then training a predictive model on the subset of genomic features mapped to these selected pathways/gene sets, we developed a novel machine learning algorithm (Path2Surv) that conjointly performs these two steps using multiple kernel learning. Results We extensively tested our Path2Surv algorithm on 7655 patients from 20 cancer types using cancer-specific pathway/gene set collections and gene expression profiles of these patients. Path2Surv statistically significantly outperformed survival random forest (RF) on 12 out of 20 datasets and obtained comparable predictive performance against survival support vector machine (SVM) using significantly fewer gene expression features (i.e. less than 10% of what survival RF and survival SVM used). Availability and implementation Our implementations of survival SVM and Path2Surv algorithms in R are available at https://github.com/mehmetgonen/path2surv together with the scripts that replicate the reported experiments. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document