Biological Knowledge Integration in DNA Microarray Gene Expression Classification Based on Rough Set Theory

Author(s):  
D. Calvo-Dmgz ◽  
J. F. Galvez ◽  
Daniel Glez-Peña ◽  
Florentino Fdez-Riverola
2012 ◽  
Vol 9 (3) ◽  
pp. 1-17 ◽  
Author(s):  
D. Calvo-Dmgz ◽  
J. F. Gálvez ◽  
D. Glez-Peña ◽  
S. Gómez-Meire ◽  
F. Fdez-Riverola

Summary DNA microarrays have contributed to the exponential growth of genomic and experimental data in the last decade. This large amount of gene expression data has been used by researchers seeking diagnosis of diseases like cancer using machine learning methods. In turn, explicit biological knowledge about gene functions has also grown tremendously over the last decade. This work integrates explicit biological knowledge, provided as gene sets, into the classication process by means of Variable Precision Rough Set Theory (VPRS). The proposed model is able to highlight which part of the provided biological knowledge has been important for classification. This paper presents a novel model for microarray data classification which is able to incorporate prior biological knowledge in the form of gene sets. Based on this knowledge, we transform the input microarray data into supergenes, and then we apply rough set theory to select the most promising supergenes and to derive a set of easy interpretable classification rules. The proposed model is evaluated over three breast cancer microarrays datasets obtaining successful results compared to classical classification techniques. The experimental results shows that there are not significat differences between our model and classical techniques but it is able to provide a biological-interpretable explanation of how it classifies new samples.


Author(s):  
Joachim Petit ◽  
Nathalie Meurice ◽  
José Luis Medina-Franco ◽  
Gerald M. Maggiora

Author(s):  
Deepak Singh ◽  
Dilip Singh Sisodia ◽  
Pradeep Singh

Discretization is one of the popular pre-processing techniques that helps a learner overcome the difficulty in handling the wide range of continuous-valued attributes. The objective of this chapter is to explore the possibilities of performance improvement in large dimensional biomedical data with the alliance of machine learning and evolutionary algorithms to design effective healthcare systems. To accomplish the goal, the model targets the preprocessing phase and developed framework based on a Fisher Markov feature selection and evolutionary based binary discretization (EBD) for a microarray gene expression classification. Several experiments were conducted on publicly available microarray gene expression datasets, including colon tumors, and lung and prostate cancer. The performance is evaluated for accuracy and standard deviations, and is also compared with the other state-of-the-art techniques. The experimental results show that the EBD algorithm performs better when compared to other contemporary discretization techniques.


Sign in / Sign up

Export Citation Format

Share Document