Variable selection and pattern recognition with gene expression data generated by the microarray technology

2002 ◽  
Vol 176 (1) ◽  
pp. 71-98 ◽  
Author(s):  
A. Szabo ◽  
K. Boucher ◽  
W.L. Carroll ◽  
L.B. Klebanov ◽  
A.D. Tsodikov ◽  
...  
Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 4288-4288
Author(s):  
Marta Campo ◽  
Andrea Zangrando ◽  
Luca Trentin ◽  
Rui Li ◽  
Wei-min Liu ◽  
...  

Abstract Gene expression microarrays had been used to classify known tumor types and various hematological malignancies (Yeoh et al, Cancer Cell 2002; Kohlmann et al, Genes Chromosomes Cancer 2003), enforcing the objective that microarray analysis could be introduced soon in the routine classification of cancer (Haferlach et al, Blood 2005). However, there’re still doubts about gene expression experiments performance in clinical laboratory diagnosis. For instance, the quality of starting material is a major concern in microarray technology and there are no data on the variation in gene expression profiles ensuing from different RNA extraction procedures. Here, as part of the internal multicenter MILE Study program, we assess the impact of different RNA preparation methods on gene expression data, analyzing 27 patients representative of nine different subtypes of pediatric acute leukemias. We compared the three currently most used protocols to isolate RNA for routine diagnosis (PCR assays) and microarray experiments. They are named as method A: lysis of mononuclear leukemia cells, followed by lysate homogeniziation, followed by total RNA isolation; method B: TRIzol RNA isolation, and method C: TRIzol RNA isolation followed by total RNA purification step. The methods were analyzed in triplicates for each sample (24) and additional three samples were performed in technical replicates of three data sets for each preparation (HG-U133 Plus 2.0). Method A results in better total RNA quality as demonstrated by 3′/5′ GAPD ratios and by RNA degradation plots. High comparability of gene expression data is found between samples in the same leukemia subclasses and collected with different RNA preparation methods thus demonstrating that sample preparation procedures do not impair the overall signal distribution. Unsupervised analyses showed clustering of samples first by each patient’s replicate conditions, then by leukemia type, and finally by leukemia lineage. In fact, B-ALL samples are clustered together, separately from T-ALL and AML, demonstrating that clustering reflects biological differences between leukemias and that the RNA isolation method is a secondary effect. Also, supervised cluster analyses highlight that samples are grouped depending on intra-lineage features (i.e. chromosomal aberrations) thus confirming the clustering organizations as reported in recent gene expression profiling studies of acute leukemias. Our study shows that biological features of pediatric acute leukemia classes largely exceed the variations between different total RNA sample preparation protocols. However, technical replicates analyses reveal that gene expression data from method A have the lowest degree of variation, are more reproducible and more precise as compared to the other two methods. Furthermore, compared to methods B and C, method A produces more differentially expressed probe sets between distinct leukemia classes and is therefore considered the more robust RNA isolation procedure for gene expression experiments using high-density microarray technology. We therefore conclude that method A (initial homogenization of the leukemia cell lysate followed by total RNA isolation) combined with a standardized microarray analysis protocol is highly reproducible and contributes to robustness of gene expression data and that this procedure is most practical for a routine laboratory use.


2012 ◽  
Vol 560-561 ◽  
pp. 401-409
Author(s):  
Xiao Li Yang ◽  
Si Ya Yang ◽  
Qiong He ◽  
Hong Yan Zhao

The purpose of this study was to develop a novel prediction method for breast cancer based on gene expression data through using a susceptible marker-selectable biomimetic pattern recognition (BPR) method, with which a parameter increasing method (PIM) was proposed to incorporate. The method was used to predict early detection, transition from normal cell to cancerous cell and prognosis signature of patients with adjuvant systemic therapy. Several genes were selected as susceptible genes associated with breast cancer. It can be shown by the results that the “cognition” BPR method could correctly predict detection, cancerous cell transition and good or poor prognosis signature with approximate 85%, 98% and 88% accuracy separately. In order to study the performance of BPR, Fisher discriminant analysis (FDA) and support vector machine (SVM) methods also were applied to analyze the gene expression data. From the results, it can be found that the BPR method is superior to FDA and SVM with respect to classification ability. Furthermore, the prediction performance can be improved through using biomarker instead of whole gene expression data for any method.


2013 ◽  
Vol 12 ◽  
pp. CIN.S10212 ◽  
Author(s):  
Lingkang Huang ◽  
Hao Helen Zhang ◽  
Zhao-Bang Zeng ◽  
Pierre R. Bushel

Background Microarray techniques provide promising tools for cancer diagnosis using gene expression profiles. However, molecular diagnosis based on high-throughput platforms presents great challenges due to the overwhelming number of variables versus the small sample size and the complex nature of multi-type tumors. Support vector machines (SVMs) have shown superior performance in cancer classification due to their ability to handle high dimensional low sample size data. The multi-class SVM algorithm of Crammer and Singer provides a natural framework for multi-class learning. Despite its effective performance, the procedure utilizes all variables without selection. In this paper, we propose to improve the procedure by imposing shrinkage penalties in learning to enforce solution sparsity. Results The original multi-class SVM of Crammer and Singer is effective for multi-class classification but does not conduct variable selection. We improved the method by introducing soft-thresholding type penalties to incorporate variable selection into multi-class classification for high dimensional data. The new methods were applied to simulated data and two cancer gene expression data sets. The results demonstrate that the new methods can select a small number of genes for building accurate multi-class classification rules. Furthermore, the important genes selected by the methods overlap significantly, suggesting general agreement among different variable selection schemes. Conclusions High accuracy and sparsity make the new methods attractive for cancer diagnostics with gene expression data and defining targets of therapeutic intervention. Availability The source MATLAB code are available from http://math.arizona.edu/∼hzhang/software.html.


2014 ◽  
Vol 13s2 ◽  
pp. CIN.S13787 ◽  
Author(s):  
Lin Zhang ◽  
Jeffrey S. Morris ◽  
Jiexin Zhang ◽  
Robert Z. Orlowski ◽  
Veerabhadran Baladandayuthapani

It is well-established that the development of a disease, especially cancer, is a complex process that results from the joint effects of multiple genes involved in various molecular signaling pathways. In this article, we propose methods to discover genes and molecular pathways significantly associated with clinical outcomes in cancer samples. We exploit the natural hierarchal structure of genes related to a given pathway as a group of interacting genes to conduct selection of both pathways and genes. We posit the problem in a hierarchical structured variable selection (HSVS) framework to analyze the corresponding gene expression data. HSVS methods conduct simultaneous variable selection at the pathway (group level) and the gene (within-group) level. To adapt to the overlapping group structure present in the pathway-gene hierarchy of the data, we developed an overlap-HSVS method that introduces latent partial effect variables that partition the marginal effect of the covariates and corresponding weights for a proportional shrinkage of the partial effects. Combining gene expression data with prior pathway information from the KEGG databases, we identified several gene-pathway combinations that are significantly associated with clinical outcomes of multiple myeloma. Biological discoveries support this relationship for the pathways and the corresponding genes we identified.


2019 ◽  
Vol 8 (3) ◽  
pp. 5366-5370

Microarray technology provides a way to identify the expression level of ten thousands of genes simultaneously. This is useful for prediction and decision for the cancer treatments. To analyze and classify the gene expression data is more complex task. The rule based classifications are used to simplify the task of classifying genes. In this paper, a novel Boolean Rule based Classification (BRC) algorithm has been proposed. The efficient and relevant Boolean rules are assisting in classifying the test data correctly by Boolean Rule based Classifier model. This model is useful for drug designers. The experimental results show that in many cases the Boolean rule based classification yields more accurate results than other classical approaches


Sign in / Sign up

Export Citation Format

Share Document