scholarly journals Unsupervised multiple kernel learning for heterogeneous data integration

2017 ◽  
Vol 34 (6) ◽  
pp. 1009-1015 ◽  
Author(s):  
Jérôme Mariette ◽  
Nathalie Villa-Vialaneix
2017 ◽  
Author(s):  
Jérôme Mariette ◽  
Nathalie Villa-Vialaneix

AbstractRecent high-throughput sequencing advances have expanded the breadth of available omics datasets and the integrated analysis of multiple datasets obtained on the same samples has allowed to gain important insights in a wide range of applications. However, the integration of various sources of information remains a challenge for systems biology since produced datasets are often of heterogeneous types, with the need of developing generic methods to take their different specificities into account.We propose a multiple kernel framework that allows to integrate multiple datasets of various types into a single exploratory analysis. Several solutions are provided to learn either a consensus meta-kernel or a meta-kernel that preserves the original topology of the datasets. We applied our framework to analyse two public multi-omics datasets. First, the multiple metagenomic datasets, collected during the TARA Oceans expedition, was explored to demonstrate that our method is able to retrieve previous findings in a single KPCA as well as to provide a new image of the sample structures when a larger number of datasets are included in the analysis. To perform this analysis, a generic procedure is also proposed to improve the interpretability of the kernel PCA in regards with the original data. Second, the multi-omics breast cancer datasets, provided by The Cancer Genome Atlas, is analysed using a kernel Self-Organizing Maps with both single and multi-omics strategies. The comparison of this two approaches demonstrates the benefit of our integration method to improve the representation of the studied biological system.Proposed methods are available in the R package mixKernel, released on CRAN. It is fully compatible with the mixOmics package and a tutorial describing the approach can be found on mixOmics web site http://mixomics.org/mixkernel/.


Author(s):  
Peiyan Wang ◽  
Dongfeng Cai

Multiple kernel learning (MKL) aims at learning an optimal combination of base kernels with which an appropriate hypothesis is determined on the training data. MKL has its flexibility featured by automated kernel learning, and also reflects the fact that typical learning problems often involve multiple and heterogeneous data sources. Target kernel is one of the most important parts of many MKL methods. These methods find the kernel weights by maximizing the similarity or alignment between weighted kernel and target kernel. The existing target kernels implement a global manner, which (1) defines the same target value for closer and farther sample pairs, and inappropriately neglects the variation of samples; (2) is independent of training data, and is hardly approximated by base kernels. As a result, maximizing the similarity to the global target kernel could make these pre-specified kernels less effectively utilized, further reducing the classification performance. In this paper, instead of defining a global target kernel, a localized target kernel is calculated for each sample pair from the training data, which is flexible and able to well handle the sample variations. A new target kernel named empirical target kernel is proposed in this research to implement this idea, and three corresponding algorithms are designed to efficiently utilize the proposed empirical target kernel. Experiments are conducted on four challenging MKL problems. The results show that our algorithms outperform other methods, verifying the effectiveness and superiority of the proposed methods.


2017 ◽  
Author(s):  
Nisar Wani ◽  
Khalid Raza

ABSTRACTComputer aided diagnosis is gradually making its way into the domain of medical research and clinical diagnosis. With field of radiology and diagnostic imaging producing petabytes of image data. Machine learning tools, particularly kernel based algorithms seem to be an obvious choice to process and analyze this high dimensional and heterogeneous data. In this chapter, after presenting a breif description about nature of medical images, image features and basics in machine learning and kernel methods, we present the application of multiple kernel learning algorithms for medical image analysis.


2017 ◽  
Vol 15 (01) ◽  
pp. 1650037 ◽  
Author(s):  
Tianci Song ◽  
Yan Wang ◽  
Wei Du ◽  
Sha Cao ◽  
Yuan Tian ◽  
...  

Breast cancer histologic grade represents the morphological assessment of the tumor’s malignancy and aggressiveness, which is vital in clinically planning treatment and estimating prognosis for patients. Therefore, the prediction of breast cancer grade can markedly elevate the detection of early breast cancer and efficiently guide its treatment. With the advent of high-throughput profiling technology, a large number of data of different types are rapidly generated, and each data provides its unique biological insight. Although many researches focused on cancer grade prediction, hardly most of them attempted to integrate multiple data types, by which we cannot only improve and boost results obtained from learning method, but also have a good understanding or explanation of biological issues. In this paper, we take advantage of a sophisticated supervised learning method called multiple kernel learning (MKL) to design a breast cancer grading predictor fusing heterogeneous data for classification of breast cancer histopathology. Furthermore, we modify our model by involving biological pathway information. The new model can evaluate the significance of various pathways in which differential expression genes fall between different breast cancer grades. The merits of the novel model are lucubration in bridging between omics data and various phenotypes of breast cancer grades, and providing an auxiliary method integrating omics data of cancer mechanism research. In experiments, the proposed method outperforms other state-of-the-art methods and has abundant biological interpretation in explaining differences between breast cancer grades.


2018 ◽  
Author(s):  
Christopher M. Wilson ◽  
Kaiqiao Li ◽  
Pei-Fen Kuan ◽  
Xuefeng Wang

AbstractAdvances in medical technology have allowed for customized prognosis, diagnosis, and personalized treatment regimens that utilize multiple heterogeneous data sources. Multiple kernel learning (MKL) is well suited for integration of multiple high throughput data sources, however, there are currently no implementations of MKL in R. In this paper, we give some background material for support vector machine (SVM) and introduce an R package, RMKL, which provides R and C++ code to implement several MKL algorithms for classification and regression problems. The provided implementations of MKL are compared using benchmark data and TCGA ovarian cancer. We demonstrate that combining multiple data sources can lead to a better classification scheme than simply using a single data source.


2010 ◽  
Vol 11 (3) ◽  
pp. 292-298
Author(s):  
Hongjun SU ◽  
Yehua SHENG ◽  
Yongning WEN ◽  
Min CHEN

Author(s):  
Guo ◽  
Xiaoqian Zhang ◽  
Zhigui Liu ◽  
Xuqian Xue ◽  
Qian Wang ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document