Multiple Kernel Learning with Gaussianity Measures

Kernel methods are known to be effective for nonlinear multivariate analysis. One of the main issues in the practical use of kernel methods is the selection of kernel. There have been a lot of studies on kernel selection and kernel learning. Multiple kernel learning (MKL) is one of the promising kernel optimization approaches. Kernel methods are applied to various classifiers including Fisher discriminant analysis (FDA). FDA gives the Bayes optimal classification axis if the data distribution of each class in the feature space is a gaussian with a shared covariance structure. Based on this fact, an MKL framework based on the notion of gaussianity is proposed. As a concrete implementation, an empirical characteristic function is adopted to measure gaussianity in the feature space associated with a convex combination of kernel functions, and two MKL algorithms are derived. From experimental results on some data sets, we show that the proposed kernel learning followed by FDA offers strong classification power.

Download Full-text

Multiple kernel learning using composite kernel functions

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2017.06.026 ◽

2017 ◽

Vol 64 ◽

pp. 391-400 ◽

Cited By ~ 2

Author(s):

Shiju S.S. ◽

Asif Salim ◽

Sumitra S.

Keyword(s):

Multiple Kernel Learning ◽

Kernel Functions ◽

Kernel Learning ◽

Multiple Kernel ◽

Composite Kernel

Download Full-text

Multi-Nyström Method Based on Multiple Kernel Learning for Large Scale Imbalanced Classification

Computational Intelligence and Neuroscience ◽

10.1155/2021/9911871 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Ling Wang ◽

Hongqiao Wang ◽

Guangyuan Fu

Keyword(s):

Kernel Methods ◽

Large Scale ◽

Multiple Kernel Learning ◽

Nonlinear Problems ◽

Low Rank ◽

Kernel Learning ◽

Low Rank Approximation ◽

Nyström Method ◽

Nystrom Method ◽

Multiple Kernel

Extensions of kernel methods for the class imbalance problems have been extensively studied. Although they work well in coping with nonlinear problems, the high computation and memory costs severely limit their application to real-world imbalanced tasks. The Nyström method is an effective technique to scale kernel methods. However, the standard Nyström method needs to sample a sufficiently large number of landmark points to ensure an accurate approximation, which seriously affects its efficiency. In this study, we propose a multi-Nyström method based on mixtures of Nyström approximations to avoid the explosion of subkernel matrix, whereas the optimization to mixture weights is embedded into the model training process by multiple kernel learning (MKL) algorithms to yield more accurate low-rank approximation. Moreover, we select subsets of landmark points according to the imbalance distribution to reduce the model’s sensitivity to skewness. We also provide a kernel stability analysis of our method and show that the model solution error is bounded by weighted approximate errors, which can help us improve the learning process. Extensive experiments on several large scale datasets show that our method can achieve a higher classification accuracy and a dramatical speedup of MKL algorithms.

Download Full-text

Modelling and Recognition of Protein Contact Networks by Multiple Kernel Learning and Dissimilarity Representations

Entropy ◽

10.3390/e22070794 ◽

2020 ◽

Vol 22 (7) ◽

pp. 794

Author(s):

Alessio Martino ◽

Enrico De Santis ◽

Alessandro Giuliani ◽

Antonello Rizzi

Keyword(s):

Knowledge Discovery ◽

Classification System ◽

Multiple Kernel Learning ◽

Classification Problem ◽

Kernel Functions ◽

Kernel Learning ◽

Biological Knowledge ◽

Training Procedure ◽

Kernel Weights ◽

Multiple Kernel

Multiple kernel learning is a paradigm which employs a properly constructed chain of kernel functions able to simultaneously analyse different data or different representations of the same data. In this paper, we propose an hybrid classification system based on a linear combination of multiple kernels defined over multiple dissimilarity spaces. The core of the training procedure is the joint optimisation of kernel weights and representatives selection in the dissimilarity spaces. This equips the system with a two-fold knowledge discovery phase: by analysing the weights, it is possible to check which representations are more suitable for solving the classification problem, whereas the pivotal patterns selected as representatives can give further insights on the modelled system, possibly with the help of field-experts. The proposed classification system is tested on real proteomic data in order to predict proteins’ functional role starting from their folded structure: specifically, a set of eight representations are drawn from the graph-based protein folded description. The proposed multiple kernel-based system has also been benchmarked against a clustering-based classification system also able to exploit multiple dissimilarities simultaneously. Computational results show remarkable classification capabilities and the knowledge discovery analysis is in line with current biological knowledge, suggesting the reliability of the proposed system.

Download Full-text

Cross-Domain Metric and Multiple Kernel Learning Based on Information Theory

Neural Computation ◽

10.1162/neco_a_01053 ◽

2018 ◽

Vol 30 (3) ◽

pp. 820-855 ◽

Cited By ~ 2

Author(s):

Wei Wang ◽

Hao Wang ◽

Chen Zhang ◽

Yang Gao

Keyword(s):

Domain Adaptation ◽

Convex Combination ◽

Learning Algorithms ◽

Metric Learning ◽

Multiple Kernel Learning ◽

Kernel Learning ◽

Target Domain ◽

Source Domain ◽

Cross Domain ◽

Multiple Kernel

Learning an appropriate distance metric plays a substantial role in the success of many learning machines. Conventional metric learning algorithms have limited utility when the training and test samples are drawn from related but different domains (i.e., source domain and target domain). In this letter, we propose two novel metric learning algorithms for domain adaptation in an information-theoretic setting, allowing for discriminating power transfer and standard learning machine propagation across two domains. In the first one, a cross-domain Mahalanobis distance is learned by combining three goals: reducing the distribution difference between different domains, preserving the geometry of target domain data, and aligning the geometry of source domain data with label information. Furthermore, we devote our efforts to solving complex domain adaptation problems and go beyond linear cross-domain metric learning by extending the first method to a multiple kernel learning framework. A convex combination of multiple kernels and a linear transformation are adaptively learned in a single optimization, which greatly benefits the exploration of prior knowledge and the description of data characteristics. Comprehensive experiments in three real-world applications (face recognition, text classification, and object categorization) verify that the proposed methods outperform state-of-the-art metric learning and domain adaptation methods.

Download Full-text

Multiple kernel learning and feature space denoising

2010 International Conference on Machine Learning and Cybernetics ◽

10.1109/icmlc.2010.5580970 ◽

2010 ◽

Author(s):

Fei Yan ◽

Josef Kittler ◽

Krystian Mikolajczyk

Keyword(s):

Multiple Kernel Learning ◽

Feature Space ◽

Kernel Learning ◽

Multiple Kernel

Download Full-text

Instance-Based Representation Using Multiple Kernel Learning for Predicting Conversion to Alzheimer Disease

International Journal of Neural Systems ◽

10.1142/s0129065718500429 ◽

2019 ◽

Vol 29 (02) ◽

pp. 1850042 ◽

Cited By ~ 5

Author(s):

D. Collazos-Huertas ◽

D. Cárdenas-Peña ◽

G. Castellanos-Dominguez

Keyword(s):

Cognitive Impairment ◽

Mild Cognitive Impairment ◽

Convex Combination ◽

Brain Mri ◽

Multiple Kernel Learning ◽

Brain Structures ◽

Reproducing Kernels ◽

Machine Learning Algorithms ◽

Kernel Learning ◽

Multiple Kernel

The early detection of Alzheimer’s disease and quantification of its progression poses multiple difficulties for machine learning algorithms. Two of the most relevant issues are related to missing data and results interpretability. To deal with both issues, we introduce a methodology to predict conversion of mild cognitive impairment patients to Alzheimer’s from structural brain MRI volumes. First, we use morphological measures of each brain structure to build an instance-based feature mapping that copes with missed follow-up visits. Then, the extracted multiple feature mappings are combined into a single representation through the convex combination of reproducing kernels. The weighting parameters per structure are tuned based on the maximization of the centered-kernel alignment criterion. We evaluate the proposed methodology on a couple of well-known classification machines employing the ADNI database devoted to assessing the combined prognostic value of several AD biomarkers. The obtained experimental results show that our proposed method of Instance-based representation using multiple kernel learning enables detecting mild cognitive impairment as well as predicting conversion to Alzheimers disease within three years from the initial screening. Besides, the brain structures with larger combination weights are directly related to memory and cognitive functions.

Download Full-text

Non-sparse Multiple Kernel Learning for Fisher Discriminant Analysis

2009 Ninth IEEE International Conference on Data Mining ◽

10.1109/icdm.2009.84 ◽

2009 ◽

Cited By ~ 15

Author(s):

Fei Yan ◽

Josef Kittler ◽

Krystian Mikolajczyk ◽

Atif Tahir

Keyword(s):

Discriminant Analysis ◽

Multiple Kernel Learning ◽

Kernel Learning ◽

Fisher Discriminant Analysis ◽

Fisher Discriminant ◽

Multiple Kernel

Download Full-text

A Conditional Entropy Minimization Criterion for Dimensionality Reduction and Multiple Kernel Learning

Neural Computation ◽

10.1162/neco_a_00027 ◽

2010 ◽

Vol 22 (11) ◽

pp. 2887-2923 ◽

Cited By ~ 14

Author(s):

Hideitsu Hino ◽

Noboru Murata

Keyword(s):

Discriminant Analysis ◽

Dimensionality Reduction ◽

Multiple Kernel Learning ◽

Conditional Entropy ◽

Kernel Learning ◽

Fisher Discriminant Analysis ◽

Essential Information ◽

Entropy Minimization ◽

Fisher Discriminant ◽

Multiple Kernel

Reducing the dimensionality of high-dimensional data without losing its essential information is an important task in information processing. When class labels of training data are available, Fisher discriminant analysis (FDA) has been widely used. However, the optimality of FDA is guaranteed only in a very restricted ideal circumstance, and it is often observed that FDA does not provide a good classification surface for many real problems. This letter treats the problem of supervised dimensionality reduction from the viewpoint of information theory and proposes a framework of dimensionality reduction based on class-conditional entropy minimization. The proposed linear dimensionality-reduction technique is validated both theoretically and experimentally. Then, through kernel Fisher discriminant analysis (KFDA), the multiple kernel learning problem is treated in the proposed framework, and a novel algorithm, which iteratively optimizes the parameters of the classification function and kernel combination coefficients, is proposed. The algorithm is experimentally shown to be comparable to or outperforms KFDA for large-scale benchmark data sets, and comparable to other multiple kernel learning techniques on the yeast protein function annotation task.

Download Full-text