Research on MDS and Semi-supervised Clustering Algorithm

Abstract Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine.

Download Full-text

A semi-supervised clustering algorithm that integrates heterogeneous dissimilarities and data sources

The 2011 International Joint Conference on Neural Networks ◽

10.1109/ijcnn.2011.6033433 ◽

2011 ◽

Author(s):

Manuel Martin-Merino

Keyword(s):

Clustering Algorithm ◽

Data Sources ◽

Supervised Clustering

Download Full-text

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

The 2006 IEEE International Joint Conference on Neural Network Proceedings ◽

10.1109/ijcnn.2006.1716175 ◽

2006 ◽

Author(s):

Jing Li ◽

Bao-Liang Lu

Keyword(s):

Clustering Algorithm ◽

Zero Crossing ◽

Modular Network ◽

Supervised Clustering

Download Full-text

Supervised Clustering Based on DPClusO: Prediction of Plant-Disease Relations Using Jamu Formulas of KNApSAcK Database

BioMed Research International ◽

10.1155/2014/831751 ◽

2014 ◽

Vol 2014 ◽

pp. 1-15 ◽

Cited By ~ 13

Author(s):

Sony Hartono Wijaya ◽

Husnawati Husnawati ◽

Farit Mochamad Afendi ◽

Irmanida Batubara ◽

Latifah K. Darusman ◽

...

Keyword(s):

Clustering Algorithm ◽

International Classification Of Diseases ◽

Network Clustering ◽

Supervised Clustering ◽

Traditional Medicines ◽

Scientific Principles ◽

Classification Of Diseases ◽

Dominant Disease ◽

Highly Correlated ◽

Matching Score

Indonesia has the largest medicinal plant species in the world and these plants are used as Jamu medicines. Jamu medicines are popular traditional medicines from Indonesia and we need to systemize the formulation of Jamu and develop basic scientific principles of Jamu to meet the requirement of Indonesian Healthcare System. We propose a new approach to predict the relation between plant and disease using network analysis and supervised clustering. At the preliminary step, we assigned 3138 Jamu formulas to 116 diseases of International Classification of Diseases (ver. 10) which belong to 18 classes of disease from National Center for Biotechnology Information. The correlation measures between Jamu pairs were determined based on their ingredient similarity. Networks are constructed and analyzed by selecting highly correlated Jamu pairs. Clusters were then generated by using the network clustering algorithm DPClusO. By using matching score of a cluster, the dominant disease and high frequency plant associated to the cluster are determined. The plant to disease relations predicted by our method were evaluated in the context of previously published results and were found to produce around 90% successful predictions.

Download Full-text

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

The 2006 IEEE International Joint Conference on Neural Network Proceedings ◽

10.1109/ijcnn.2006.246764 ◽

2006 ◽

Author(s):

Jing Li ◽

Bao-Liang Lu

Keyword(s):

Clustering Algorithm ◽

Zero Crossing ◽

Modular Network ◽

Supervised Clustering

Download Full-text

A supervised clustering algorithm for computer intrusion detection

Knowledge and Information Systems ◽

10.1007/s10115-005-0195-8 ◽

2005 ◽

Vol 8 (4) ◽

pp. 498-509 ◽

Cited By ~ 8

Author(s):

Xiangyang Li ◽

Nong Ye

Keyword(s):

Intrusion Detection ◽

Clustering Algorithm ◽

Supervised Clustering ◽

Computer Intrusion Detection

Download Full-text

GRAPH BASED CLUSTERING WITH CONSTRAINTS AND ACTIVE LEARNING

Journal of Computer Science and Cybernetics ◽

10.15625/1813-9663/37/1/15773 ◽

2021 ◽

Vol 37 (1) ◽

pp. 71-89

Author(s):

Vu-Tuan Dang ◽

Viet-Vu Vu ◽

Hong-Quan Do ◽

Thi Kieu Oanh Le

Keyword(s):

Active Learning ◽

Clustering Algorithm ◽

Side Information ◽

Clustering Algorithms ◽

Real Data ◽

Data Sets ◽

Data Set ◽

Supervised Clustering ◽

Class Labels ◽

Graph Based Clustering

During the past few years, semi-supervised clustering has emerged as a new interesting direction in machine learning research. In a semi-supervised clustering algorithm, the clustering results can be significantly improved by using side information, which is available or collected from users. There are two main kinds of side information that can be learned in semi-supervised clustering algorithms: the class labels - called seeds or the pairwise constraints. The first semi-supervised clustering was introduced in 2000, and since that, many algorithms have been presented in literature. However, it is not easy to use both types of side information in the same algorithm. To address the problem, this paper proposes a semi-supervised graph based clustering algorithm that tries to use seeds and constraints in the clustering process, called MCSSGC. Moreover, we introduces a simple but efficient active learning method to collect the constraints that can boost the performance of MCSSGC, named KMMFFQS. In order to verify effectiveness of the proposed algorithm, we conducted a series of experiments not only on real data sets from UCI, but also on a document data set applied in an Information Extraction of Vietnamese documents. These obtained results show that the proposed algorithm can significantly improve the clustering process compared to some recent algorithms.

Download Full-text

Research on MDS and Semi-supervised Clustering Algorithm

Automated Constraint Selection for Semi-supervised Clustering Algorithm

Pruning Training Samples Using a Supervised Clustering Algorithm

A Novel Supervised Clustering Algorithm for Transportation System Applications

Supervised clustering of high-dimensional data using regularized mixture modeling

A semi-supervised clustering algorithm that integrates heterogeneous dissimilarities and data sources

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

Supervised Clustering Based on DPClusO: Prediction of Plant-Disease Relations Using Jamu Formulas of KNApSAcK Database

A New Supervised Clustering Algorithm Based on Min-Max Modular Network with Gaussian-Zero-Crossing Functions

A supervised clustering algorithm for computer intrusion detection

GRAPH BASED CLUSTERING WITH CONSTRAINTS AND ACTIVE LEARNING

Export Citation Format