Network as a Biomarker: A Novel Network-Based Sparse Bayesian Machine for Pathway-Driven Drug Response Prediction

With the advances in different biological networks including gene regulation, gene co-expression, protein–protein interaction networks, and advanced approaches for network reconstruction, analysis, and interpretation, it is possible to discover reliable and accurate molecular network-based biomarkers for monitoring cancer treatment. Such efforts will also pave the way toward the realization of biomarker-driven personalized medicine against cancer. Previously, we have reconstructed disease-specific driver signaling networks using multi-omics profiles and cancer signaling pathway data. In this study, we developed a network-based sparse Bayesian machine (NBSBM) approach, using previously derived disease-specific driver signaling networks to predict cancer cell responses to drugs. NBSBM made use of the information encoded in a disease-specific (differentially expressed) network to improve its prediction performance in problems with a reduced amount of training data and a very high-dimensional feature space. Sparsity in NBSBM is favored by a spike and slab prior distribution, which is combined with a Markov random field prior that encodes the network of feature dependencies. Gene features that are connected in the network are assumed to be both relevant and irrelevant to drug responses. We compared the proposed method with network-based support vector machine (NBSVM) approaches and found that the NBSBM approach could achieve much better accuracy than the other two NBSVM methods. The gene modules selected from the disease-specific driver networks for predicting drug sensitivity might be directly involved in drug sensitivity or resistance. This work provides a disease-specific network-based drug sensitivity prediction approach and can uncover the potential mechanisms of the action of drugs by selecting the most predictive sub-networks from the disease-specific network.

Download Full-text

Evaluating Grayware Characteristics and Risks

Journal of Computer Networks and Communications ◽

10.1155/2011/569829 ◽

2011 ◽

Vol 2011 ◽

pp. 1-28 ◽

Cited By ~ 1

Author(s):

Zhongqiang Chen ◽

Zhanyan Liang ◽

Yuan Zhang ◽

Zhongrong Chen

Keyword(s):

Information Gain ◽

Feature Space ◽

Training Data ◽

Support Vector ◽

Learning Models ◽

Generalization Capability ◽

Self Organizing Maps ◽

Defense Strategies ◽

Security Applications ◽

Vector Machines

Grayware encyclopedias collect known species to provide information for incident analysis, however, the lack of categorization and generalization capability renders them ineffective in the development of defense strategies against clustered strains. A grayware categorization framework is therefore proposed here to not only classify grayware according to diverse taxonomic features but also facilitate evaluations on grayware risk to cyberspace. Armed with Support Vector Machines, the framework builds learning models based on training data extracted automatically from grayware encyclopedias and visualizes categorization results with Self-Organizing Maps. The features used in learning models are selected with information gain and the high dimensionality of feature space is reduced by word stemming and stopword removal process. The grayware categorizations on diversified features reveal that grayware typically attempts to improve its penetration rate by resorting to multiple installation mechanisms and reduced code footprints. The framework also shows that grayware evades detection by attacking victims' security applications and resists being removed by enhancing its clotting capability with infected hosts. Our analysis further points out that species in categoriesSpywareandAdwarecontinue to dominate the grayware landscape and impose extremely critical threats to the Internet ecosystem.

Download Full-text

Estimating the Support of a High-Dimensional Distribution

Neural Computation ◽

10.1162/089976601750264965 ◽

2001 ◽

Vol 13 (7) ◽

pp. 1443-1471 ◽

Cited By ~ 2293

Author(s):

Bernhard Schölkopf ◽

John C. Platt ◽

John Shawe-Taylor ◽

Alex J. Smola ◽

Robert C. Williamson

Keyword(s):

A Priori ◽

Natural Extension ◽

Feature Space ◽

Weight Vector ◽

Test Point ◽

Training Data ◽

Sequential Optimization ◽

Support Vector ◽

Small Subset ◽

Data Set

Suppose you are given some data set drawn from an underlying probability distribution P and you want to estimate a “simple” subset S of input space such that the probability that a test point drawn from P lies outside of S equals some a priori specified value between 0 and 1. We propose a method to approach this problem by trying to estimate a function f that is positive on S and negative on the complement. The functional form of f is given by a kernel expansion in terms of a potentially small subset of the training data; it is regularized by controlling the length of the weight vector in an associated feature space. The expansion coefficients are found by solving a quadratic programming problem, which we do by carrying out sequential optimization over pairs of input patterns. We also provide a theoretical analysis of the statistical performance of our algorithm. The algorithm is a natural extension of the support vector algorithm to the case of unlabeled data.

Download Full-text

THE USE OF MACHINE LEARNING METHODS FOR BINARY CLASSIFICATION OF THE WORKING CONDITION OF BEARINGS USING THE SIGNALS OF VIBRATION ACCELERATION

Bulletin of National Technical University KhPI Series System Analysis Control and Information Technologies ◽

10.20998/2079-0023.2021.02.03 ◽

2021 ◽

pp. 15-22

Author(s):

Ruslan Babudzhan ◽

Konstantyn Isaienkov ◽

Danilo Krasiy ◽

Oleksii Vodka ◽

Ivan Zadorozhny ◽

...

Keyword(s):

Machine Learning ◽

Binary Classification ◽

Fractal Dimensions ◽

Feature Space ◽

Training Data ◽

Supervised Machine Learning ◽

Support Vector ◽

Data Sets ◽

Vibration Acceleration ◽

K Nearest Neighbors

The paper investigates the relationship between vibration acceleration of bearings with their operational state. To determine these dependencies, a testbench was built and 112 experiments were carried out with different bearings: 100 bearings that developed an internal defect during operation and 12bearings without a defect. From the obtained records, a dataset was formed, which was used to build classifiers. Dataset is freely available. A methodfor classifying new and used bearings was proposed, which consists in searching for dependencies and regularities of the signal using descriptive functions: statistical, entropy, fractal dimensions and others. In addition to processing the signal itself, the frequency domain of the bearing operationsignal was also used to complement the feature space. The paper considered the possibility of generalizing the classification for its application on thosesignals that were not obtained in the course of laboratory experiments. An extraneous dataset was found in the public domain. This dataset was used todetermine how accurate a classifier was when it was trained and tested on significantly different signals. Training and validation were carried out usingthe bootstrapping method to eradicate the effect of randomness, given the small amount of training data available. To estimate the quality of theclassifiers, the F1-measure was used as the main metric due to the imbalance of the data sets. The following supervised machine learning methodswere chosen as classifier models: logistic regression, support vector machine, random forest, and K nearest neighbors. The results are presented in theform of plots of density distribution and diagrams.

Download Full-text

Feature Selection Algorithm Considering Trial and Individual Differences for Machine Learning of Human Activity Recognition

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2017.p0813 ◽

2017 ◽

Vol 21 (5) ◽

pp. 813-824 ◽

Cited By ~ 2

Author(s):

Yuto Omae ◽

Hirotaka Takahashi ◽

◽

Keyword(s):

Machine Learning ◽

Human Body ◽

Evaluation Method ◽

Probability Distributions ◽

Feature Space ◽

Training Data ◽

Sensor Data ◽

Support Vector ◽

Feature Spaces ◽

Generalization Errors

In recent years, many studies have been performed on the automatic classification of human body motions based on inertia sensor data using a combination of inertia sensors and machine learning; training data is necessary where sensor data and human body motions correspond to one another. It can be difficult to conduct experiments involving a large number of subjects over an extended time period, because of concern for the fatigue or injury of subjects. Many studies, therefore, allow a small number of subjects to perform repeated body motions subject to classification, to acquire data on which to build training data. Any classifiers constructed using such training data will have some problems associated with generalization errors caused by individual and trial differences. In order to suppress such generalization errors, feature spaces must be obtained that are less likely to generate generalization errors due to individual and trial differences. To obtain such feature spaces, we require indices to evaluate the likelihood of the feature spaces generating generalization errors due to individual and trial errors. This paper, therefore, aims to devise such evaluation indices from the perspectives. The evaluation indices we propose in this paper can be obtained by first constructing acquired data probability distributions that represent individual and trial differences, and then using such probability distributions to calculate any risks of generating generalization errors. We have verified the effectiveness of the proposed evaluation method by applying it to sensor data for butterfly and breaststroke swimming. For the purpose of comparison, we have also applied a few available existing evaluation methods. We have constructed classifiers for butterfly and breaststroke swimming by applying a support vector machine to the feature spaces obtained by the proposed and existing methods. Based on the accuracy verification we conducted with test data, we found that the proposed method produced significantly higher F-measure than the existing methods. This proves that the use of the proposed evaluation indices enables us to obtain a feature space that is less likely to generate generalization errors due to individual and trial differences.

Download Full-text

SVDD-Based Pattern Denoising

Neural Computation ◽

10.1162/neco.2007.19.7.1919 ◽

2007 ◽

Vol 19 (7) ◽

pp. 1919-1938 ◽

Cited By ~ 36

Author(s):

Jooyoung Park ◽

Daesung Kang ◽

Jongho Kim ◽

James T. Kwok ◽

Ivor W. Tsang

Keyword(s):

Test Pattern ◽

Main Idea ◽

Feature Space ◽

Training Data ◽

Support Vector ◽

Support Vector Data Description ◽

Data Sets ◽

Decision Boundary ◽

Vector Data ◽

Real World Data

The support vector data description (SVDD) is one of the best-known one-class support vector learning methods, in which one tries the strategy of using balls defined on the feature space in order to distinguish a set of normal data from all other possible abnormal objects. The major concern of this letter is to extend the main idea of SVDD to pattern denoising. Combining the geodesic projection to the spherical decision boundary resulting from the SVDD, together with solving the preimage problem, we propose a new method for pattern denoising. We first solve SVDD for the training data and then for each noisy test pattern, obtain its denoised feature by moving its feature vector along the geodesic on the manifold to the nearest decision boundary of the SVDD ball. Finally we find the location of the denoised pattern by obtaining the pre-image of the denoised feature. The applicability of the proposed method is illustrated by a number of toy and real-world data sets.

Download Full-text

Kernel-Based Nonlinear Blind Source Separation

Neural Computation ◽

10.1162/089976603765202677 ◽

2003 ◽

Vol 15 (5) ◽

pp. 1089-1124 ◽

Cited By ~ 75

Author(s):

Stefan Harmeling ◽

Andreas Ziehe ◽

Motoaki Kawanabe ◽

Klaus-Robert Müller

Keyword(s):

Blind Source Separation ◽

Source Separation ◽

Selection Procedure ◽

Feature Space ◽

Temporal Information ◽

Training Data ◽

Support Vector ◽

Research Fields ◽

Feature Spaces ◽

Reduced Space

We propose kTDSEP, a kernel-based algorithm for nonlinear blind source separation (BSS). It combines complementary research fields: kernel feature spaces and BSS using temporal information. This yields an efficient algorithm for nonlinear BSS with invertible nonlinearity. Key assumptions are that the kernel feature space is chosen rich enough to approximate the nonlinearity and that signals of interest contain temporal information. Both assumptions are fulfilled for a wide set of real-world applications. The algorithm works as follows: First, the data are (implicitly) mapped to a high (possibly infinite)—dimensional kernel feature space. In practice, however, the data form a smaller submanifold in feature space—even smaller than the number of training data points—a fact that has already been used by, for example, reduced set techniques for support vector machines. We propose to adapt to this effective dimension as a preprocessing step and to construct an orthonormal basis of this submanifold. The latter dimension-reduction step is essential for making the subsequent application of BSS methods computationally and numerically tractable. In the reduced space, we use a BSS algorithm that is based on second-order temporal decorrelation. Finally, we propose a selection procedure to obtain the original sources from the extracted nonlinear components automatically. Experiments demonstrate the excellent performance and efficiency of our kTDSEP algorithm for several problems of nonlinear BSS and for more than two sources.

Download Full-text

Learning in a Large Function Space: Privacy-Preserving Mechanisms for SVM Learning

Journal of Privacy and Confidentiality ◽

10.29012/jpc.v4i1.612 ◽

2012 ◽

Vol 4 (1) ◽

Cited By ~ 35

Author(s):

Benjamin I. P. Rubinstein ◽

Peter L. Bartlett ◽

Ling Huang ◽

Nina Taft

Keyword(s):

High Probability ◽

Large Scale ◽

Differential Privacy ◽

Feature Space ◽

Privacy Preserving ◽

Training Data ◽

Inner Product ◽

Support Vector ◽

Sensitive Information ◽

Finite Dimensional

The ubiquitous need for analyzing privacy-sensitive information—including health records, personal communications, product ratings and social network data—is driving significant interest in privacy-preserving data analysis across several research communities. This paper explores the release of Support Vector Machine (SVM) classifiers while preserving the privacy of training data. The SVM is a popular machine learning method that maps data to a high-dimensional feature space before learning a linear decision boundary. We present efficient mechanisms for finite-dimensional feature mappings and for (potentially infinite-dimensional) mappings with translation-invariant kernels. In the latter case, our mechanism borrows a technique from large-scale learning to learn in a finite-dimensional feature space whose inner-product uniformly approximates the desired feature space inner-product (the desired kernel) with high probability. Differential privacy is established using algorithmic stability, a property used in learning theory to bound generalization error. Utility—when the private classifier is pointwise close to the non-private classifier with high probability—is proven using smoothness of regularized empirical risk minimization with respect to small perturbations to the feature mapping. Finally we conclude with lower bounds on the differential privacy of any mechanism approximating the SVM.

Download Full-text

Local and Regional Hour-Ahead Forecasts of Solar Irradiance with Training Data Selection and Support Vector Regression

IEEJ Transactions on Power and Energy ◽

10.1541/ieejpes.136.898 ◽

2016 ◽

Vol 136 (12) ◽

pp. 898-907 ◽

Cited By ~ 2

Author(s):

Joao Gari da Silva Fonseca Junior ◽

Hideaki Ohtake ◽

Takashi Oozeki ◽

Kazuhiko Ogimoto

Keyword(s):

Support Vector Regression ◽

Solar Irradiance ◽

Training Data ◽

Data Selection ◽

Support Vector ◽

Training Data Selection

Download Full-text

An Analog Circuit Fault Diagnosis Approach Based on Wavelet-based fractal analysis and Multiple Kernel SVM

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813666201207154641 ◽

2020 ◽

Vol 13 ◽

Author(s):

Jianfeng Jiang

Keyword(s):

Fault Diagnosis ◽

Fractal Analysis ◽

Analog Circuit ◽

Training Data ◽

Support Vector ◽

Pass Filter ◽

Multiple Kernel ◽

Testing Data ◽

Circuit Fault Diagnosis ◽

Diagnosis Approach

Objective: In order to diagnose the analog circuit fault correctly, an analog circuit fault diagnosis approach on basis of wavelet-based fractal analysis and multiple kernel support vector machine (MKSVM) is presented in the paper. Methods: Time responses of the circuit under different faults are measured, and then wavelet-based fractal analysis is used to process the collected time responses for the purpose of generating features for the signals. Kernel principal component analysis (KPCA) is applied to reduce the features’ dimensionality. Afterwards, features are divided into training data and testing data. MKSVM with its multiple parameters optimized by chaos particle swarm optimization (CPSO) algorithm is utilized to construct an analog circuit fault diagnosis model based on the testing data. Results: The proposed analog diagnosis approach is revealed by a four opamp biquad high-pass filter fault diagnosis simulation. Conclusion: The approach outperforms other commonly used methods in the comparisons.

Download Full-text

Siamese Reconstruction Network: Accurate Image Reconstruction from Human Brain Activity by Learning to Compare

Applied Sciences ◽

10.3390/app9224749 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4749

Author(s):

Lingyun Jiang ◽

Kai Qiao ◽

Linyuan Wang ◽

Chi Zhang ◽

Jian Chen ◽

...

Keyword(s):

Deep Learning ◽

Human Brain ◽

Brain Activity ◽

Feature Space ◽

Training Data ◽

Reconstruction Method ◽

Learning Method ◽

Training Samples ◽

Visual Reconstruction ◽

Relationship Of

Decoding human brain activities, especially reconstructing human visual stimuli via functional magnetic resonance imaging (fMRI), has gained increasing attention in recent years. However, the high dimensionality and small quantity of fMRI data impose restrictions on satisfactory reconstruction, especially for the reconstruction method with deep learning requiring huge amounts of labelled samples. When compared with the deep learning method, humans can recognize a new image because our human visual system is naturally capable of extracting features from any object and comparing them. Inspired by this visual mechanism, we introduced the mechanism of comparison into deep learning method to realize better visual reconstruction by making full use of each sample and the relationship of the sample pair by learning to compare. In this way, we proposed a Siamese reconstruction network (SRN) method. By using the SRN, we improved upon the satisfying results on two fMRI recording datasets, providing 72.5% accuracy on the digit dataset and 44.6% accuracy on the character dataset. Essentially, this manner can increase the training data about from n samples to 2n sample pairs, which takes full advantage of the limited quantity of training samples. The SRN learns to converge sample pairs of the same class or disperse sample pairs of different class in feature space.

Download Full-text