A Novel Hybrid Approach for Multi-objective Bi-clustering in Microarray data

Author(s):  
Naveen Trivedi ◽  
Suvendu Kanungo

Background: Today bi-clustering technique plays a vital role to analyze gene expression data in microarray technology. This technique performs clustering on both rows and columns of expression data simultaneously. It determines the expression level of genes set under the subset of several conditions or samples. Basically, obtained information is collected in the form of a sub matrix comprising of microarray data that satisfy coherent expression patterns of subsets of genes with respect to subsets of conditions. These sub matrices are represented as bi-clusters and overall process is called bi-clustering. In this paper, we proposed a new meta-heuristics hybrid ABC-MWOA-CC which is based on artificial bee colony (ABC), modified whale optimization algorithm (MWOA) and Cheng and Church (CC) algorithm to optimize the extracted bi-clusters. In order to validate this algorithm, we also delve into finding the statistical and biological relevancy of extracted genes with respect to various conditions. However, most of the bi-clustering techniques do not address the biological significance of genes belonging to extracted bi-clusters Objective: The major aim of the proposed work is to design and develop a novel hybrid multi-objective bi-clustering approach for in microarray data to produce desired number of valid bi-clusters. Further, these extracted bi-clusters are to be optimized to obtain optimal solution. Method: In the proposed approach, a hybrid multi-objective bi-clustering algorithm which is based on ABC along with MWOA is recommended to group the data into desired number of bi-clusters. Further, ABC with MWOA multi-objective optimization algorithm is applied in order to optimize the solutions using variety of the fitness functions. Results: In the analysis of the result, the multi-objective functions which are employed to judge the fitness calculation like Volume Mean (VM), Mean of Genes (GM), Mean of Conditions (CM) and Mean of MSR (MMSR) leads to improve the performance analysis of the CC bi-clustering algorithm on real life data set such as Yeast Saccharomyces cerevisiae cell cycle gene Expression datasets. Conclusion: The effectiveness of the ABC-MWOA-CC algorithm is comprehensively demonstrated by comparing it with well-known traditional ABC-CC, OPSM and CC algorithm in terms of VM, GM, CM and MMSR.

2008 ◽  
Vol 5 (2) ◽  
Author(s):  
Krzysztof Borowski ◽  
Jung Soh ◽  
Christoph W. Sensen

SummaryThe need for novel methods of visualizing microarray data is growing. New perspectives are beneficial to finding patterns in expression data. The Bluejay genome browser provides an integrative way of visualizing gene expression datasets in a genomic context. We have now developed the functionality to display multiple microarray datasets simultaneously in Bluejay, in order to provide researchers with a comprehensive view of their datasets linked to a graphical representation of gene function. This will enable biologists to obtain valuable insights on expression patterns, by allowing them to analyze the expression values in relation to the gene locations as well as to compare expression profiles of related genomes or of di erent experiments for the same genome.


2009 ◽  
Vol 07 (04) ◽  
pp. 645-661 ◽  
Author(s):  
XIN CHEN

There is an increasing interest in clustering time course gene expression data to investigate a wide range of biological processes. However, developing a clustering algorithm ideal for time course gene express data is still challenging. As timing is an important factor in defining true clusters, a clustering algorithm shall explore expression correlations between time points in order to achieve a high clustering accuracy. Moreover, inter-cluster gene relationships are often desired in order to facilitate the computational inference of biological pathways and regulatory networks. In this paper, a new clustering algorithm called CurveSOM is developed to offer both features above. It first presents each gene by a cubic smoothing spline fitted to the time course expression profile, and then groups genes into clusters by applying a self-organizing map-based clustering on the resulting splines. CurveSOM has been tested on three well-studied yeast cell cycle datasets, and compared with four popular programs including Cluster 3.0, GENECLUSTER, MCLUST, and SSClust. The results show that CurveSOM is a very promising tool for the exploratory analysis of time course expression data, as it is not only able to group genes into clusters with high accuracy but also able to find true time-shifted correlations of expression patterns across clusters.


2012 ◽  
Vol 6 ◽  
pp. BBI.S10383
Author(s):  
Priscilla Rajadurai ◽  
Swamynathan Sankaranarayanan

Recently, microarray technologies have become a robust technique in the area of genomics. An important step in the analysis of gene expression data is the identification of groups of genes disclosing analogous expression patterns. Cluster analysis partitions a given dataset into groups based on specified features. Euclidean distance is a widely used similarity measure for gene expression data that considers the amount of changes in gene expression. However, the huge number of genes and the intricacy of biological networks have highly increased the challenges of comprehending and interpreting the resulting group of data, increasing processing time. The proposed technique focuses on a QT based fast 2-dimensional hierarchical clustering algorithm to perform clustering. The construction of the closest pair data structure is an each level is an important time factor, which determines the processing time of clustering. The proposed model reduces the processing time and improves analysis of gene expression data.


2020 ◽  
Vol 21 (S13) ◽  
Author(s):  
Sudipta Acharya ◽  
Laizhong Cui ◽  
Yi Pan

Abstract Background In the field of computational biology, analyzing complex data helps to extract relevant biological information. Sample classification of gene expression data is one such popular bio-data analysis technique. However, the presence of a large number of irrelevant/redundant genes in expression data makes a sample classification algorithm working inefficiently. Feature selection is one such high-dimensionality reduction technique that helps to maximize the effectiveness of any sample classification algorithm. Recent advances in biotechnology have improved the biological data to include multi-modal or multiple views. Different ‘omics’ resources capture various equally important biological properties of entities. However, most of the existing feature selection methodologies are biased towards considering only one out of multiple biological resources. Consequently, some crucial aspects of available biological knowledge may get ignored, which could further improve feature selection efficiency. Results In this present work, we have proposed a Consensus Multi-View Multi-objective Clustering-based feature selection algorithm called CMVMC. Three controlled genomic and proteomic resources like gene expression, Gene Ontology (GO), and protein-protein interaction network (PPIN) are utilized to build two independent views. The concept of multi-objective consensus clustering has been applied within our proposed gene selection method to satisfy both incorporated views. Gene expression data sets of Multiple tissues and Yeast from two different organisms (Homo Sapiens and Saccharomyces cerevisiae, respectively) are chosen for experimental purposes. As the end-product of CMVMC, a reduced set of relevant and non-redundant genes are found for each chosen data set. These genes finally participate in an effective sample classification. Conclusions The experimental study on chosen data sets shows that our proposed feature-selection method improves the sample classification accuracy and reduces the gene-space up to a significant level. In the case of Multiple Tissues data set, CMVMC reduces the number of genes (features) from 5565 to 41, with 92.73% of sample classification accuracy. For Yeast data set, the number of genes got reduced to 10 from 2884, with 95.84% sample classification accuracy. Two internal cluster validity indices - Silhouette and Davies-Bouldin (DB) and one external validity index Classification Accuracy (CA) are chosen for comparative study. Reported results are further validated through well-known biological significance test and visualization tool.


2005 ◽  
Vol 03 (06) ◽  
pp. 1295-1313 ◽  
Author(s):  
YOSHINORI TAMADA ◽  
HIDEO BANNAI ◽  
SEIYA IMOTO ◽  
TOSHIAKI KATAYAMA ◽  
MINORU KANEHISA ◽  
...  

Since microarray gene expression data do not contain sufficient information for estimating accurate gene networks, other biological information has been considered to improve the estimated networks. Recent studies have revealed that highly conserved proteins that exhibit similar expression patterns in different organisms, have almost the same function in each organism. Such conserved proteins are also known to play similar roles in terms of the regulation of genes. Therefore, this evolutionary information can be used to refine regulatory relationships among genes, which are estimated from gene expression data. We propose a statistical method for estimating gene networks from gene expression data by utilizing evolutionarily conserved relationships between genes. Our method simultaneously estimates two gene networks of two distinct organisms, with a Bayesian network model utilizing the evolutionary information so that gene expression data of one organism helps to estimate the gene network of the other. We show the effectiveness of the method through the analysis on Saccharomyces cerevisiae and Homo sapiens cell cycle gene expression data. Our method was successful in estimating gene networks that capture many known relationships as well as several unknown relationships which are likely to be novel. Supplementary information is available at .


2017 ◽  
Vol 20 (2) ◽  
Author(s):  
Jorge Parraga-Alava ◽  
Mario Inostroza-Ponta

Clustering algorithms are a common method for data analysis in many science field. They have become popular among biologists because of ease to discovery similar cellular functions in gene expression data. Most approaches consider the gene clustering as an optimization problem, where an ad-hoc cluster quality index is optimized which can be defined regarding gene expression data or biological information. However, these approaches may not be sufficient since they cannot guarantee to generate clusters with similar expression patterns and biological coherence. In this paper, we propose a bi-objective clustering algorithm to discover clusters of genes with high levels of co-expression and biological coherence. Our approach uses a multi-objective evolutionary algorithm (MOEA) that optimizes two index based on gene expression level and biological functional classes. The algorithm is tested on three real-life gene expression datasets. Results show that the proposed model yields gene clusters with higher levels of co-expression and biological coherence than traditional approaches.


2021 ◽  
Vol 7 ◽  
pp. e416
Author(s):  
Amr Mohamed AbdelAziz ◽  
Taysir Soliman ◽  
Kareem Kamal A. Ghany ◽  
Adel Sewisy

A microarray is a revolutionary tool that generates vast volumes of data that describe the expression profiles of genes under investigation that can be qualified as Big Data. Hadoop and Spark are efficient frameworks, developed to store and analyze Big Data. Analyzing microarray data helps researchers to identify correlated genes. Clustering has been successfully applied to analyze microarray data by grouping genes with similar expression profiles into clusters. The complex nature of microarray data obligated clustering methods to employ multiple evaluation functions to ensure obtaining solutions with high quality. This transformed the clustering problem into a Multi-Objective Problem (MOP). A new and efficient hybrid Multi-Objective Whale Optimization Algorithm with Tabu Search (MOWOATS) was proposed to solve MOPs. In this article, MOWOATS is proposed to analyze massive microarray datasets. Three evaluation functions have been developed to ensure an effective assessment of solutions. MOWOATS has been adapted to run in parallel using Spark over Hadoop computing clusters. The quality of the generated solutions was evaluated based on different indices, such as Silhouette and Davies–Bouldin indices. The obtained clusters were very similar to the original classes. Regarding the scalability, the running time was inversely proportional to the number of computing nodes.


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2628
Author(s):  
Mengxing Huang ◽  
Qianhao Zhai ◽  
Yinjie Chen ◽  
Siling Feng ◽  
Feng Shu

Computation offloading is one of the most important problems in edge computing. Devices can transmit computation tasks to servers to be executed through computation offloading. However, not all the computation tasks can be offloaded to servers with the limitation of network conditions. Therefore, it is very important to decide quickly how many tasks should be executed on servers and how many should be executed locally. Only computation tasks that are properly offloaded can improve the Quality of Service (QoS). Some existing methods only focus on a single objection, and of the others some have high computational complexity. There still have no method that could balance the targets and complexity for universal application. In this study, a Multi-Objective Whale Optimization Algorithm (MOWOA) based on time and energy consumption is proposed to solve the optimal offloading mechanism of computation offloading in mobile edge computing. It is the first time that MOWOA has been applied in this area. For improving the quality of the solution set, crowding degrees are introduced and all solutions are sorted by crowding degrees. Additionally, an improved MOWOA (MOWOA2) by using the gravity reference point method is proposed to obtain better diversity of the solution set. Compared with some typical approaches, such as the Grid-Based Evolutionary Algorithm (GrEA), Cluster-Gradient-based Artificial Immune System Algorithm (CGbAIS), Non-dominated Sorting Genetic Algorithm III (NSGA-III), etc., the MOWOA2 performs better in terms of the quality of the final solutions.


Sign in / Sign up

Export Citation Format

Share Document