scholarly journals Optimization algorithm for omic data subspace clustering

2021 ◽  
Author(s):  
Madalina Ciortan ◽  
Matthieu Defrance
2021 ◽  
Author(s):  
Leonardo Duarte Rodrigues Alexandre ◽  
Rafael S. Costa ◽  
Rui Henriques

Motivation: Pattern discovery and subspace clustering play a central role in the biological domain, supporting for instance putative regulatory module discovery from omic data for both descriptive and predictive ends. In the presence of target variables (e.g. phenotypes), regulatory patterns should further satisfy delineate discriminative power properties, well-established in the presence of categorical outcomes, yet largely disregarded for numerical outcomes, such as risk profiles and quantitative phenotypes. Results: DISA (Discriminative and Informative Subspace Assessment), a Python software package, is proposed to assess patterns in the presence of numerical outcomes using well-established measures together with a novel principle able to statistically assess the correlation gain of the subspace against the overall space. Results confirm the possibility to soundly extend discriminative criteria towards numerical outcomes without the drawbacks well-associated with discretization procedures. A case study is provided to show the properties of the proposed method. Availability: DISA is freely available at https://github.com/JupitersMight/DISA under the MIT license.


2020 ◽  
Vol 10 (24) ◽  
pp. 8942
Author(s):  
Jing Liu ◽  
Yanfeng Sun ◽  
Yongli Hu

The deep subspace clustering method, which adopts deep neural networks to learn a representation matrix for subspace clustering, has shown good performance. However, this representation matrix ignores the structural constraint when it is applied to subspace clustering. It is known that samples from different classes can be taken as embedding in independent subspaces. Thus, the representation matrix should have a block diagonal structure. This paper presents the Deep Subspace Clustering with Block Diagonal Constraint (DSC-BDC), a model which constrains the representation matrix with block diagonal structure and gives a block diagonal regularizer for learning a suitable representation. Furthermore, to enhance the representation capacity, DSC-BDC reforms the block-diagonal structure constraint by performing a separation strategy on the representation matrix. Specifically, the separation strategy ensures that the most compact samples are selected to the represent data. An alternative optimization algorithm is designed for our model. Extensive experiments on four public and real-world databases demonstrate the effectiveness and superiority of our proposed model.


2021 ◽  
Author(s):  
Madalina Ciortan ◽  
Matthieu Defrance

Subspace clustering identifies multiple feature subspaces embedded in a dataset together with the underlying sample clusters. When applied to omic data, subspace clustering is a challenging task, as additional problems have to be addressed: the curse of dimensionality, the imperfect data quality and cluster separation, the presence of multiple subspaces representative of divergent views of the dataset, and the lack of consensus on the best clustering method. First, we propose a computational method discover to perform subspace clustering on tabular high dimensional data by maximizing the internal clustering score (i.e. cluster compactness) of feature subspaces. Our algorithm can be used in both unsupervised and semi-supervised settings. Secondly, by applying our method to a large set of omic datasets (i.e. microarray, bulk RNA-seq, scRNA-seq), we show that the subspace corresponding to the provided ground truth annotations is rarely the most compact one, as assumed by the methods maximizing the internal quality of clusters. Our results highlight the difficulty of fully validating subspace clusters (justified by the lack of feature annotations). Tested on identifying the ground-truth subspace, our method compared favorably with competing techniques on all datasets. Finally, we propose a suite of techniques to interpret the clustering results biologically in the absence of annotations. We demonstrate that subspace clustering can provide biologically meaningful sample-wise and feature-wise information, typically missed by traditional methods.


Sign in / Sign up

Export Citation Format

Share Document