Deep learning-based clustering approaches for bioinformatics

Abstract Clustering is central to many data-driven bioinformatics research and serves a powerful computational method. In particular, clustering helps at analyzing unstructured and high-dimensional data in the form of sequences, expressions, texts and images. Further, clustering is used to gain insights into biological processes in the genomics level, e.g. clustering of gene expressions provides insights on the natural structure inherent in the data, understanding gene functions, cellular processes, subtypes of cells and understanding gene regulations. Subsequently, clustering approaches, including hierarchical, centroid-based, distribution-based, density-based and self-organizing maps, have long been studied and used in classical machine learning settings. In contrast, deep learning (DL)-based representation and feature learning for clustering have not been reviewed and employed extensively. Since the quality of clustering is not only dependent on the distribution of data points but also on the learned representation, deep neural networks can be effective means to transform mappings from a high-dimensional data space into a lower-dimensional feature space, leading to improved clustering results. In this paper, we review state-of-the-art DL-based approaches for cluster analysis that are based on representation learning, which we hope to be useful, particularly for bioinformatics research. Further, we explore in detail the training procedures of DL-based clustering algorithms, point out different clustering quality metrics and evaluate several DL-based approaches on three bioinformatics use cases, including bioimaging, cancer genomics and biomedical text mining. We believe this review and the evaluation results will provide valuable insights and serve a starting point for researchers wanting to apply DL-based unsupervised methods to solve emerging bioinformatics research problems.

Download Full-text

Cross Breed Clustering Algorithm for High Dimensional Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a5313.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 5049-5052

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Data Clustering ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional ◽

Growing Domain ◽

Present World

Clustering plays a major role in machine learning and also in data mining. Deep learning is fast growing domain in present world. Improving the quality of the clustering results by adopting the deep learning algorithms. Many clustering algorithm process various datasets to get the better results. But for the high dimensional data clustering is still an issue to process and get the quality clustering results with the existing clustering algorithms. In this paper, the cross breed clustering algorithm for high dimensional data is utilized. Various datasets are used to get the results.

Download Full-text

Subspace Clustering of High Dimensional Data Using Differential Evolution

Nature-Inspired Algorithms for Big Data Frameworks - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-5225-5852-1.ch003 ◽

2019 ◽

pp. 47-74 ◽

Cited By ~ 1

Author(s):

Parul Agarwal ◽

Shikha Mehta

Keyword(s):

Differential Evolution ◽

Distance Measure ◽

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Dbscan Clustering ◽

Evolution Algorithms ◽

Self Adaptive

Subspace clustering approaches cluster high dimensional data in different subspaces. It means grouping the data with different relevant subsets of dimensions. This technique has become very effective as a distance measure becomes ineffective in a high dimensional space. This chapter presents a novel evolutionary approach to a bottom up subspace clustering SUBSPACE_DE which is scalable to high dimensional data. SUBSPACE_DE uses a self-adaptive DBSCAN algorithm to perform clustering in data instances of each attribute and maximal subspaces. Self-adaptive DBSCAN clustering algorithms accept input from differential evolution algorithms. The proposed SUBSPACE_DE algorithm is tested on 14 datasets, both real and synthetic. It is compared with 11 existing subspace clustering algorithms. Evaluation metrics such as F1_Measure and accuracy are used. Performance analysis of the proposed algorithms is considerably better on a success rate ratio ranking in both accuracy and F1_Measure. SUBSPACE_DE also has potential scalability on high dimensional datasets.

Download Full-text

Efficiency and Effectiveness of Clustering Algorithms for High Dimensional Data

International Journal of Computer Applications ◽

10.5120/ijca2015906144 ◽

2015 ◽

Vol 125 (11) ◽

pp. 35-40

Author(s):

Smita Chormunge ◽

Sudarson Jena

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional ◽

Efficiency And Effectiveness

Download Full-text

M-Denclue for Effective Data Clustering in High Dimensional Non-Linear Data

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.a9109.119119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 2925-2927

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

Research Work ◽

Curse Of Dimensionality ◽

Distance Measures ◽

High Dimensional ◽

Clustering Methods ◽

Non Linear ◽

Low Dimensional ◽

Automatic Grouping

Clustering is a data mining task devoted to the automatic grouping of data based on mutual similarity. Clustering in high-dimensional spaces is a recurrent problem in many domains. It affects time complexity, space complexity, scalability and accuracy of clustering methods. Highdimensional non-linear datausually live in different low dimensional subspaces hidden in the original space. As high‐dimensional objects appear almost alike, new approaches for clustering are required. This research has focused on developing Mathematical models, techniques and clustering algorithms specifically for high‐dimensional data. The innocent growth in the fields of communication and technology, there is tremendous growth in high dimensional data spaces. As the variant of dimensions on high dimensional non-linear data increases, many clustering techniques begin to suffer from the curse of dimensionality, de-grading the quality of the results. In high dimensional non-linear data, the data becomes very sparse and distance measures become increasingly meaningless. The principal challenge for clustering high dimensional data is to overcome the “curse of dimensionality”. This research work concentrates on devising an enhanced algorithm for clustering high dimensional non-linear data.

Download Full-text

A Hybrid Deep Learning-Based Unsupervised Anomaly Detection in High Dimensional Data

Computers Materials & Continua ◽

10.32604/cmc.2022.021113 ◽

2022 ◽

Vol 70 (3) ◽

pp. 5363-5381

Author(s):

Amgad Muneer ◽

Shakirah Mohd Taib ◽

Suliman Mohamed Fati ◽

Abdullateef O. Balogun ◽

Izzatdin Abdul Aziz

Keyword(s):

Deep Learning ◽

Anomaly Detection ◽

High Dimensional Data ◽

High Dimensional ◽

Unsupervised Anomaly Detection

Download Full-text

Clustering for High Dimensional Data: Density based Subspace Clustering Algorithms

International Journal of Computer Applications ◽

10.5120/10584-5732 ◽

2013 ◽

Vol 63 (20) ◽

pp. 29-35 ◽

Cited By ~ 1

Author(s):

Sunita Jahirabadkar ◽

Parag Kulkarni

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

Subspace Clustering ◽

High Dimensional ◽

Data Density

Download Full-text

Relevant Attribute Discovery in High Dimensional Data: Application to Breast Cancer Gene Expressions

Rough Sets and Knowledge Technology - Lecture Notes in Computer Science ◽

10.1007/11795131_70 ◽

2006 ◽

pp. 482-489 ◽

Cited By ~ 11

Author(s):

Julio J. Valdés ◽

Alan J. Barton

Keyword(s):

Breast Cancer ◽

High Dimensional Data ◽

High Dimensional ◽

Cancer Gene ◽

Gene Expressions ◽

Data Application ◽

Relevant Attribute ◽

Breast Cancer Gene

Download Full-text

Clustering High Dimensional Data Using Subspace and Projected Clustering Algorithms

International Journal of Computer Science and Information Technology ◽

10.5121/ijcsit.2010.2414 ◽

2010 ◽

Vol 2 (4) ◽

pp. 162-170 ◽

Cited By ~ 7

Author(s):

Rahmat Widia Sembiring ◽

Jasni Mohamad Zain ◽

Abdullah Embong

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional ◽

Projected Clustering

Download Full-text

Clustering Algorithms For High Dimensional Data – A Survey Of Issues And Existing Approaches

International Journal of Computer Science and Informatics ◽

10.47893/ijcsi.2013.1108 ◽

2013 ◽

pp. 293-299

Author(s):

B.Hari Babu ◽

N.Subash Chandra ◽

T. Venu Gopal

Keyword(s):

Dimensional Space ◽

Clustering Algorithms ◽

High Dimensional Data ◽

Microarray Gene Expression Data ◽

Distance Measures ◽

High Dimensional ◽

Data Mining Technique ◽

Microarray Gene Expression ◽

Redundancy Elimination ◽

Different Types

Clustering is the most prominent data mining technique used for grouping the data into clusters based on distance measures. With the advent growth of high dimensional data such as microarray gene expression data, and grouping high dimensional data into clusters will encounter the similarity between the objects in the full dimensional space is often invalid because it contains different types of data. The process of grouping into high dimensional data into clusters is not accurate and perhaps not up to the level of expectation when the dimension of the dataset is high. It is now focusing tremendous attention towards research and development. The performance issues of the data clustering in high dimensional data it is necessary to study issues like dimensionality reduction, redundancy elimination, subspace clustering, co-clustering and data labeling for clusters are to analyzed and improved. In this paper, we presented a brief comparison of the existing algorithms that were mainly focusing at clustering on high dimensional data.

Download Full-text

A Survey on Various Clustering Algorithms in High Dimensional Data

Proceedings of the International Conference on Emerging Trends in Engineering & Technology (ICETET-2015) ◽

10.3850/978-981-09-5346-1_cse-557 ◽

2015 ◽

Author(s):

M. Amina ◽

K. Syed Farook

Keyword(s):

Clustering Algorithms ◽

High Dimensional Data ◽

High Dimensional

Download Full-text