Microarray Data Mining

Author(s):  
Giulia Bruno ◽  
Alessandro Fiori

Microarray technology is a powerful tool to analyze thousands of gene expression values with a single experiment. Due to the huge amount of data, most of recent studies are focused on the analysis and the extraction of useful and interesting information from microarray data. Examples of applications include detecting genes highly correlated to diseases, selecting genes which show a similar behavior under specific conditions, building models to predict the disease outcome based on genetic profiles, and inferring regulatory networks. This chapter presents a review of four popular data mining techniques (i.e., Classification, Feature Selection, Clustering and Association Rule Mining) applied to microarray data. It describes the main characteristics of microarray data in order to understand the critical issues which are introduced by gene expression values analysis. Each technique is analyzed and examples of pertinent literature are reported. Finally, prospects of data mining research on microarray data are provided.

2014 ◽  
Vol 25 (1) ◽  
pp. 29-58 ◽  
Author(s):  
Alessandro Fiori ◽  
Alberto Grand ◽  
Giulia Bruno ◽  
Francesco Gavino Brundu ◽  
Domenico Schioppa ◽  
...  

Nowadays, a huge amount of high throughput molecular data are available for analysis and provide novel and useful insights into complex biological systems, through the acquisition of a high-resolution picture of their molecular status in defined experimental conditions. In this context, microarrays are a powerful tool to analyze thousands of gene expression values with a single experiment. A number of approaches have been developed to detecting genes highly correlated to diseases, selecting genes that exhibit a similar behavior under specific conditions, building models to predict disease outcome based on genetic profiles, and inferring regulatory networks. This paper discusses popular and recent data mining techniques (i.e., Feature Selection, Clustering, Classification, and Association Rule Mining) applied to microarray data. The main characteristics of microarray data and preprocessing procedures are presented to understand the critical issues introduced by gene expression values analysis. Each technique is analyzed, and relevant examples of pertinent literature are reported. Moreover, real use cases exploiting analytic pipelines that use these methods are also introduced. Finally, future directions of data mining research on microarray data are envisioned.


2016 ◽  
pp. 1180-1211 ◽  
Author(s):  
Alessandro Fiori ◽  
Alberto Grand ◽  
Giulia Bruno ◽  
Francesco Gavino Brundu ◽  
Domenico Schioppa ◽  
...  

Nowadays, a huge amount of high throughput molecular data are available for analysis and provide novel and useful insights into complex biological systems, through the acquisition of a high-resolution picture of their molecular status in defined experimental conditions. In this context, microarrays are a powerful tool to analyze thousands of gene expression values with a single experiment. A number of approaches have been developed to detecting genes highly correlated to diseases, selecting genes that exhibit a similar behavior under specific conditions, building models to predict disease outcome based on genetic profiles, and inferring regulatory networks. This paper discusses popular and recent data mining techniques (i.e., Feature Selection, Clustering, Classification, and Association Rule Mining) applied to microarray data. The main characteristics of microarray data and preprocessing procedures are presented to understand the critical issues introduced by gene expression values analysis. Each technique is analyzed, and relevant examples of pertinent literature are reported. Moreover, real use cases exploiting analytic pipelines that use these methods are also introduced. Finally, future directions of data mining research on microarray data are envisioned.


2008 ◽  
pp. 1643-1673
Author(s):  
Jilin Han ◽  
Le Gruenwald ◽  
Tyrrell Conway

The study of gene expression levels under defined experimental conditions is an important approach to understand how a living cell works. High-throughput microarray technology is a very powerful tool for simultaneously studying thousands of genes in a single experiment. This revolutionary technology results in an extensive amount of data, which raises an important question: how to extract meaningful biological information from these data? In this chapter, we survey data mining techniques that have been used for clustering, classification and association rules for gene expression data analysis. In addition, we provide a comprehensive list of currently available commercial and academic data mining software together with their features. Lastly, we suggest future research directions.


Author(s):  
Lei Yu ◽  
Huan Liu

The advent of gene expression microarray technology enables the simultaneous measurement of expression levels for thousands or tens of thousands of genes in a single experiment (Schena, et al., 1995). Analysis of gene expression microarray data presents unprecedented opportunities and challenges for data mining in areas such as gene clustering (Eisen, et al., 1998; Tamayo, et al., 1999), sample clustering and class discovery (Alon, et al., 1999; Golub, et al., 1999), sample class prediction (Golub, et al., 1999; Wu, et al., 2003), and gene selection (Xing, Jordan, & Karp, 2001; Yu & Liu, 2004). This article introduces the basic concepts of gene expression microarray data and describes relevant data-mining tasks. It briefly reviews the state-of-the-art methods for each data-mining task and identifies emerging challenges and future research directions in microarray data analysis.


Author(s):  
Triantafyllos Paparountas ◽  
Maria Nefeli Nikolaidou-Katsaridou ◽  
Gabriella Rustici ◽  
Vasilis Aidinis

Microarray technology enables high-throughput parallel gene expression analysis, and use has grown exponentially thanks to the development of a variety of applications for expression, genetics and epigenetic studies. A wealth of data is now available from public repositories, providing unprecedented opportunities for meta-analysis approaches, which could generate new biological information, unrelated to the original scope of individual studies. This study provides a guideline for identification of biological significance of the statistically-selected differentially-expressed genes derived from gene expression arrays as well as to suggest further analysis pathways. The authors review the prerequisites for data-mining and meta-analysis, summarize the conceptual methods to derive biological information from microarray data and suggest software for each category of data mining or meta-analysis.


Author(s):  
Jilin Han ◽  
Le Gruenwald ◽  
Tyrrell Conway

The study of gene expression levels under defined experimental conditions is an important approach to understand how a living cell works. High-throughput microarray technology is a very powerful tool for simultaneously studying thousands of genes in a single experiment. This revolutionary technology results in an extensive amount of data, which raises an important question: how to extract meaningful biological information from these data? In this chapter, we survey data mining techniques that have been used for clustering, classification and association rules for gene expression data analysis. In addition, we provide a comprehensive list of currently available commercial and academic data mining software together with their features. Lastly, we suggest future research directions.


2009 ◽  
pp. 45-64
Author(s):  
Gráinne Kerr ◽  
Heather Ruskin ◽  
Martin Crane

Microarray technology1 provides an opportunity to monitor mRNA levels of expression of thousands of genes simultaneously in a single experiment. The enormous amount of data produced by this high throughput approach presents a challenge for data analysis: to extract meaningful patterns, to evaluate its quality, and to interpret the results. The most commonly used method of identifying such patterns is cluster analysis. Common and sufficient approaches to many data-mining problems, for example, Hierarchical, K-means, do not address well the properties of “typical” gene expression data and fail, in significant ways, to account for its profile. This chapter clarifies some of the issues and provides a framework to evaluate clustering in gene expression analysis. Methods are categorised explicitly in the context of application to data of this type, providing a basis for reverse engineering of gene regulation networks. Finally, areas for possible future development are highlighted.


2013 ◽  
Vol 2013 ◽  
pp. 1-5 ◽  
Author(s):  
Jeyakumar Natarajan

Current microarray data mining methods such as clustering, classification, and association analysis heavily rely on statistical and machine learning algorithms for analysis of large sets of gene expression data. In recent years, there has been a growing interest in methods that attempt to discover patterns based on multiple but related data sources. Gene expression data and the corresponding literature data are one such example. This paper suggests a new approach to microarray data mining as a combination of text mining (TM) and information extraction (IE). TM is concerned with identifying patterns in natural language text and IE is concerned with locating specific entities, relations, and facts in text. The present paper surveys the state of the art of data mining methods for microarray data analysis. We show the limitations of current microarray data mining methods and outline how text mining could address these limitations.


Sign in / Sign up

Export Citation Format

Share Document