A Visual Analytic for High-Dimensional Data Exploitation: The Heterogeneous Data-Reduction Proximity Tool

This paper presents the R/Bioconductor package stepwiseCM, which classifies cancer samples using two heterogeneous data sets in an efficient way. The algorithm is able to capture the distinct classification power of two given data types without actually combining them. This package suits for classification problems where two different types of data sets on the same samples are available. One of these data types has measurements on all samples and the other one has measurements on some samples. One is easy to collect and/or relatively cheap (eg, clinical covariates) compared to the latter (high-dimensional data, eg, gene expression). One additional application for which stepwiseCM is proven to be useful as well is the combination of two high-dimensional data types, eg, DNA copy number and mRNA expression. The package includes functions to project the neighborhood information in one data space to the other to determine a potential group of samples that are likely to benefit most by measuring the second type of covariates. The two heterogeneous data spaces are connected by indirect mapping. The crucial difference between the stepwise classification strategy implemented in this package and the existing packages is that our approach aims to be cost-efficient by avoiding measuring additional covariates, which might be expensive or patient-unfriendly, for a potentially large subgroup of individuals. Moreover, in diagnosis for these individuals test, results would be quickly available, which may lead to reduced waiting times and hence lower the patients’ distress. The improvement described remedies the key limitations of existing packages, and facilitates the use of the stepwiseCM package in diverse applications.

Download Full-text

A General Framework for High-Dimensional Data Reduction Using Unsupervised Bayesian Model

Communications in Computer and Information Science - Life System Modeling and Intelligent Computing ◽

10.1007/978-3-642-15859-9_14 ◽

2010 ◽

pp. 96-101 ◽

Cited By ~ 1

Author(s):

Longcun Jin ◽

Wanggen Wan ◽

Yongliang Wu ◽

Bin Cui ◽

Xiaoqing Yu

Keyword(s):

Data Reduction ◽

Bayesian Model ◽

General Framework ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

A Robust High-dimensional Data Reduction Method

International Journal of Virtual Reality ◽

10.20870/ijvr.2010.9.1.2762 ◽

2010 ◽

Vol 9 (1) ◽

pp. 55-60 ◽

Cited By ~ 2

Author(s):

Longcun Jin ◽

Wanggen Wan ◽

Yongliang Wu ◽

Bin Cui ◽

Xiaoqing Yu ◽

...

Keyword(s):

Data Reduction ◽

Reduction Method ◽

Hyperspectral Image ◽

High Dimensional Data ◽

Cognitive Model ◽

Pure Component ◽

High Dimensional ◽

Model Parameters ◽

Reduction Algorithm ◽

Data Reduction Method

In this paper, we propose a robust high-dimensional data reduction method. The model assumes that the pixel reflec-tance results from linear combinations of pure component spectra contaminated by an additive noise. The abundance parameters appearing in this model satisfy positivity and additive constraints. These constraints are naturally expressed in a Bayesian literature by using appropriate abundance prior distributions. The posterior distributions of the unknown model parameters are then derived. The proposed algorithm consists of Bayesian inductive cognition part and hierarchical reduction algorithm model part. The pro-posed reduction algorithm based on Bayesian inductive cognitive model is used to decide which dimensions are advantageous and to output the recommended dimensions of the hyperspectral image. The algorithm can be interpreted as a robust reduction inference method for a Bayesian inductive cognitive model. Experimental results on high-dimensional data demonstrate useful properties of the proposed reduction algorithm.

Download Full-text