A Systems Biology Approach for Unsupervised Clustering of High-Dimensional Data

Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1540 ◽

2011 ◽

Vol 10 (1) ◽

Cited By ~ 17

Author(s):

Claus-Dieter Mayer ◽

Julie Lorent ◽

Graham W Horgan

Keyword(s):

Systems Biology ◽

High Dimensional Data ◽

High Dimensional ◽

Data Sets ◽

Multivariate Statistical ◽

Data Set ◽

Analysis Of Similarity ◽

Matrix Correlation ◽

Substantial Bias ◽

Rv Coefficient

The integration of multiple high-dimensional data sets (omics data) has been a very active but challenging area of bioinformatics research in recent years. Various adaptations of non-standard multivariate statistical tools have been suggested that allow to analyze and visualize such data sets simultaneously. However, these methods typically can deal with two data sets only, whereas systems biology experiments often generate larger numbers of high-dimensional data sets. For this reason, we suggest an explorative analysis of similarity between data sets as an initial analysis steps. This analysis is based on the RV coefficient, a matrix correlation, that can be interpreted as a generalization of the squared correlation from two single variables to two sets of variables. It has been shown before however that the high-dimensionality of the data introduces substantial bias to the RV.We therefore introduce an alternative version, the adjusted RV, which is unbiased in the case of independent data sets. We can also show that in many situations, particularly for very high-dimensional data sets, the adjusted RV is a better estimator than previously RV versions in terms of the mean square error and the power of the independence test based on it.We demonstrate the usefulness of the adjusted RV by applying it to data set of 19 different multivariate data sets from a systems biology experiment. The pairwise RV values between the data sets define a similarity matrix that we can use as an input to a hierarchical clustering or a multi-dimensional scaling. We show that this reveals biological meaningful subgroups of data sets in our study.

Download Full-text

Large Sample Covariance Matrices and High-Dimensional Data Analysis

10.1017/cbo9781107588080 ◽

2015 ◽

Cited By ~ 26

Author(s):

Jianfeng Yao ◽

Shurong Zheng ◽

Zhidong Bai

Keyword(s):

Data Analysis ◽

High Dimensional Data ◽

Covariance Matrices ◽

High Dimensional ◽

Large Sample ◽

Sample Covariance Matrices ◽

Sample Covariance ◽

High Dimensional Data Analysis

Download Full-text

Fractal-Based Methods as a Technique for Estimating the Intrinsic Dimensionality of High-Dimensional Data: A Survey

Informatica ◽

10.15388/informatica.2016.84 ◽

2016 ◽

Vol 27 (2) ◽

pp. 257-281 ◽

Cited By ~ 5

Author(s):

Rasa Karbauskaitė ◽

Gintautas Dzemyda

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Intrinsic Dimensionality

Download Full-text

A Fast Clustering Algorithm for Large-scale and High Dimensional Data

ACTA AUTOMATICA SINICA ◽

10.3724/sp.j.1004.2009.00859 ◽

2009 ◽

Vol 35 (7) ◽

pp. 859-866

Author(s):

Ming LIU ◽

Xiao-Long WANG ◽

Yuan-Chao LIU

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

High Dimensional Data ◽

High Dimensional

Download Full-text

Improved negative selection algorithm for network anomaly detection on high-dimensional data

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.00805 ◽

2009 ◽

Vol 29 (3) ◽

pp. 805-807 ◽

Cited By ~ 1

Author(s):

Wen-zhong GUO ◽

Guo-long CHEN ◽

Qing-liang CHEN

Keyword(s):

Anomaly Detection ◽

Negative Selection ◽

High Dimensional Data ◽

High Dimensional ◽

Selection Algorithm ◽

Negative Selection Algorithm ◽

Network Anomaly Detection

Download Full-text

An Advanced Mining Services in Predicting and Ranking User Vitality across Dynamic and High Dimensional Data Sets

SSRN Electronic Journal ◽

10.2139/ssrn.3395242 ◽

2019 ◽

Author(s):

Ch. Durga Bhavani ◽

Dr. A. Daveedu Raju ◽

Dr. V. Surya Narayana

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Data Sets

Download Full-text

Outlier Detection in High Dimensional Data Based on the Anti-Hub and Regression Technique

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2017.8219 ◽

2017 ◽

Vol V (VIII) ◽

pp. 1543-1551

Author(s):

Golla Hemalatha

Keyword(s):

Outlier Detection ◽

High Dimensional Data ◽

Regression Technique ◽

High Dimensional

Download Full-text

Approximate Cluster Heat Maps of Large High-Dimensional Data

2018 24th International Conference on Pattern Recognition (ICPR) ◽

10.1109/icpr.2018.8545519 ◽

2018 ◽

Cited By ~ 1

Author(s):

Punit Rathore ◽

James C. Bezdek ◽

Dheeraj Kumar ◽

Sutharshan Rajasegarar ◽

Marimuthu Palaniswami

Keyword(s):

High Dimensional Data ◽

High Dimensional ◽

Heat Maps

Download Full-text

The Generalized Bayes Method for High-Dimensional Data Recognition with Applications to Audio Signal Recognition

Symmetry ◽

10.3390/sym13010019 ◽

2020 ◽

Vol 13 (1) ◽

pp. 19

Author(s):

Hsiuying Wang

Keyword(s):

Gaussian Mixture Model ◽

Mixture Model ◽

Conventional Method ◽

High Dimensional Data ◽

Audio Signal ◽

Gaussian Mixture ◽

High Dimensional ◽

Signal Recognition ◽

Bayes Method ◽

Generalized Bayes

High-dimensional data recognition problem based on the Gaussian Mixture model has useful applications in many area, such as audio signal recognition, image analysis, and biological evolution. The expectation-maximization algorithm is a popular approach to the derivation of the maximum likelihood estimators of the Gaussian mixture model (GMM). An alternative solution is to adopt a generalized Bayes estimator for parameter estimation. In this study, an estimator based on the generalized Bayes approach is established. A simulation study shows that the proposed approach has a performance competitive to that of the conventional method in high-dimensional Gaussian mixture model recognition. We use a musical data example to illustrate this recognition problem. Suppose that we have audio data of a piece of music and know that the music is from one of four compositions, but we do not know exactly which composition it comes from. The generalized Bayes method shows a higher average recognition rate than the conventional method. This result shows that the generalized Bayes method is a competitor to the conventional method in this real application.

Download Full-text

ResNet Autoencoders for Unsupervised Feature Learning From High-Dimensional Data: Deep Models Resistant to Performance Degradation

IEEE Access ◽

10.1109/access.2021.3064819 ◽

2021 ◽

Vol 9 ◽

pp. 40511-40520

Author(s):

Chathurika S. Wickramasinghe ◽

Daniel L. Marino ◽

Milos Manic

Keyword(s):

High Dimensional Data ◽

Feature Learning ◽

Performance Degradation ◽

High Dimensional ◽

Unsupervised Feature Learning

Download Full-text