New Data on Robustness of Gene Expression Signatures in Leukemia: Comparison of Three Distinct Total RNA Preparation Procedures.

Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 4288-4288
Author(s):  
Marta Campo ◽  
Andrea Zangrando ◽  
Luca Trentin ◽  
Rui Li ◽  
Wei-min Liu ◽  
...  

Abstract Gene expression microarrays had been used to classify known tumor types and various hematological malignancies (Yeoh et al, Cancer Cell 2002; Kohlmann et al, Genes Chromosomes Cancer 2003), enforcing the objective that microarray analysis could be introduced soon in the routine classification of cancer (Haferlach et al, Blood 2005). However, there’re still doubts about gene expression experiments performance in clinical laboratory diagnosis. For instance, the quality of starting material is a major concern in microarray technology and there are no data on the variation in gene expression profiles ensuing from different RNA extraction procedures. Here, as part of the internal multicenter MILE Study program, we assess the impact of different RNA preparation methods on gene expression data, analyzing 27 patients representative of nine different subtypes of pediatric acute leukemias. We compared the three currently most used protocols to isolate RNA for routine diagnosis (PCR assays) and microarray experiments. They are named as method A: lysis of mononuclear leukemia cells, followed by lysate homogeniziation, followed by total RNA isolation; method B: TRIzol RNA isolation, and method C: TRIzol RNA isolation followed by total RNA purification step. The methods were analyzed in triplicates for each sample (24) and additional three samples were performed in technical replicates of three data sets for each preparation (HG-U133 Plus 2.0). Method A results in better total RNA quality as demonstrated by 3′/5′ GAPD ratios and by RNA degradation plots. High comparability of gene expression data is found between samples in the same leukemia subclasses and collected with different RNA preparation methods thus demonstrating that sample preparation procedures do not impair the overall signal distribution. Unsupervised analyses showed clustering of samples first by each patient’s replicate conditions, then by leukemia type, and finally by leukemia lineage. In fact, B-ALL samples are clustered together, separately from T-ALL and AML, demonstrating that clustering reflects biological differences between leukemias and that the RNA isolation method is a secondary effect. Also, supervised cluster analyses highlight that samples are grouped depending on intra-lineage features (i.e. chromosomal aberrations) thus confirming the clustering organizations as reported in recent gene expression profiling studies of acute leukemias. Our study shows that biological features of pediatric acute leukemia classes largely exceed the variations between different total RNA sample preparation protocols. However, technical replicates analyses reveal that gene expression data from method A have the lowest degree of variation, are more reproducible and more precise as compared to the other two methods. Furthermore, compared to methods B and C, method A produces more differentially expressed probe sets between distinct leukemia classes and is therefore considered the more robust RNA isolation procedure for gene expression experiments using high-density microarray technology. We therefore conclude that method A (initial homogenization of the leukemia cell lysate followed by total RNA isolation) combined with a standardized microarray analysis protocol is highly reproducible and contributes to robustness of gene expression data and that this procedure is most practical for a routine laboratory use.

2015 ◽  
Vol 49 (3) ◽  
pp. 647-658 ◽  
Author(s):  
Ana Carvalho ◽  
Clara Graça ◽  
Victor Carocha ◽  
Susana Pêra ◽  
José Luís Lousada ◽  
...  

2019 ◽  
Vol 8 (3) ◽  
pp. 5366-5370

Microarray technology provides a way to identify the expression level of ten thousands of genes simultaneously. This is useful for prediction and decision for the cancer treatments. To analyze and classify the gene expression data is more complex task. The rule based classifications are used to simplify the task of classifying genes. In this paper, a novel Boolean Rule based Classification (BRC) algorithm has been proposed. The efficient and relevant Boolean rules are assisting in classifying the test data correctly by Boolean Rule based Classifier model. This model is useful for drug designers. The experimental results show that in many cases the Boolean rule based classification yields more accurate results than other classical approaches


2021 ◽  
Author(s):  
Yusuf Khan ◽  
Daniel Hammarström ◽  
Stian Ellefsen ◽  
Rafi Ahmad

Abstract BackgroundThe biological relevance and accuracy of gene expression data depend on the adequacy of data normalization. This is both due to its role in resolving and accounting for technical variation and errors, and its defining role in shaping the viewpoint of biological interpretations. Still, normalization is often treated in serendipitous manners. This is especially true for the viewpoint perspective, which may be particularly decisive for conclusions in studies involving pronounced cellular plasticity. In this study, we highlight the consequences of using three fundamentally different modes of normalization for interpreting RNA-seq data from human skeletal muscle undergoing exercise-training-induced growth. Briefly, 25 participants conducted 12 weeks of high-load resistance training. Muscle biopsy specimens were sampled from m. vastus lateralis before, after two weeks of training (week 2) and after the intervention (week 12), and were subsequently analyzed using RNA-seq. Transcript counts were modeled as i) per-library-size, ii) per-total-RNA, and iii) per-sample-size (per-mg-tissue). ResultInitially, the three modes of transcript modeling led to the identification of three unique sets of stable genes, which displayed differential expression profiles. Specifically, genes showing stable expression across samples in the per-library-size dataset displayed training-associated increases in per-total-RNA and per-sample-size datasets. These gene sets were then used for normalization of the entire dataset, providing transcript abundance estimates corresponding to each of the three biological viewpoints (i.e., per-library-size, per-total-RNA, and per-sample-size). The different normalization modes led to different conclusions, measured as training-associated changes in transcript expression. Briefly, for 28% and 24% of the transcripts, training was associated with changes in expression in per-total-RNA and per-sample-size scenarios, but not in the per-library-size scenario. At week 2, this led to opposite conclusions for 5% of the transcripts between per-library-size and per-sample-size datasets (↑ vs. ↓, respectively). ConclusionScientists should be explicit with their choice of normalization strategies and should interpret the results of gene expression analyses with caution. This is particularly important for data sets involving a limited number of genes or involving growing or differentiating cellular models, where the risk of biased conclusions is pronounced.


2013 ◽  
Vol 13 ◽  
pp. 29
Author(s):  
A. Partyka ◽  
A. Zielak-Steciwko ◽  
W. Niżański ◽  
J. Bajzert

2002 ◽  
Vol 176 (1) ◽  
pp. 71-98 ◽  
Author(s):  
A. Szabo ◽  
K. Boucher ◽  
W.L. Carroll ◽  
L.B. Klebanov ◽  
A.D. Tsodikov ◽  
...  

2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Zi-Yi Yang ◽  
Xiao-Ying Liu ◽  
Jun Shu ◽  
Hui Zhang ◽  
Yan-Qiong Ren ◽  
...  

Abstract The widespread applications in microarray technology have produced the vast quantity of publicly available gene expression datasets. However, analysis of gene expression data using biostatistics and machine learning approaches is a challenging task due to (1) high noise; (2) small sample size with high dimensionality; (3) batch effects and (4) low reproducibility of significant biomarkers. These issues reveal the complexity of gene expression data, thus significantly obstructing microarray technology in clinical applications. The integrative analysis offers an opportunity to address these issues and provides a more comprehensive understanding of the biological systems, but current methods have several limitations. This work leverages state of the art machine learning development for multiple gene expression datasets integration, classification and identification of significant biomarkers. We design a novel integrative framework, MVIAm - Multi-View based Integrative Analysis of microarray data for identifying biomarkers. It applies multiple cross-platform normalization methods to aggregate multiple datasets into a multi-view dataset and utilizes a robust learning mechanism Multi-View Self-Paced Learning (MVSPL) for gene selection in cancer classification problems. We demonstrate the capabilities of MVIAm using simulated data and studies of breast cancer and lung cancer, it can be applied flexibly and is an effective tool for facing the four challenges of gene expression data analysis. Our proposed model makes microarray integrative analysis more systematic and expands its range of applications.


Blood ◽  
2006 ◽  
Vol 108 (11) ◽  
pp. 4304-4304
Author(s):  
Wen Wei ◽  
Julie Tsai ◽  
Barbara Brady ◽  
Wei-min Liu ◽  
Xiaoying Chen ◽  
...  

Abstract Gene expression profiling (GEP) is a powerful technology for the molecular analysis of leukemia and it groups biologically defined disease entities into distinct sub-classes that can provide diagnosis, guide therapy, and even correlate with disease prognosis. The experimental procedures of micorarray analysis are often cumbersome and provide ample opportunity for variability in gene expression data. We previously reported on our efforts to standardize micorarray analysis across 11 participating laboratories within the international MILE study (Microarray Innovations in LEukemia) where a large dataset of over 4,000 leukemia patient samples is being generated using both Affymetrix HG-U133 Plus 2.0 and custom format microarrays. For a better applicability in a routine laboratory workflow and in order to improve the robustness of the micorarray analysis we now have modified the original micorarray sample preparation protocols as originally published by the manufacturer. Here we report the final results of this effort to minimize the complexity of the sample preparation protocol and to reduce the time that is necessary to run the assay. We designed pre-assembled kits for total RNA preparation, nucleic acid cleanup, cDNA synthesis, in vitro transcription, hybridization and staining, and wash buffers guiding the operator through the whole process of sample preparation to microarray result generation. To further improve the ease of use of this assay we minimized to a large extent the overall complexity of sample amplification and labeling, as well as target hybridization and detection procedures. For example, for the RNA amplification, cRNA labeling, and signal detection process, the number of individual reagent vials was reduced from 32 to 13 vials. This was achieved by combining individual components to ready-to-use master mixes. Furthermore, starting from total RNA, the time required for generation of labeled and fragmented cRNA has been reduced to a convenient eight hour work-shift. Overall, compared to the original 48 hour protocol as recommended by the manufacturer the new workflow generates microarray data in 26 hours. In total, this development program included n=900 whole genome microarray tests. By comparison testing of the original and the final modified protocols on we further can demonstrate by squared correlation coefficients both high inter-assay (R2 > 0.9) and intra-assay (R2 > 0.9) reproducibility and precision of gene expression data of this new sample preparation method. Data from cell lines, normal bone marrow, as well as leukemia samples representing the subclasses AML with normal karyotype or other abnormalities, AML with complex aberrant karyotype, CML, and CLL indicate that reproducible subclassification of leukemias is feasible as all samples were predicted by a classification algorithm as the same class as when the samples were prepared according to the the original method. In conclusion, we developed a robust sample processing methodology for microarray analysis of leukemia samples that allows to generate standardized and reproducible microarray results in multiple laboratories.


2006 ◽  
Vol 24 (18_suppl) ◽  
pp. 13046-13046 ◽  
Author(s):  
O. Oberschmidt ◽  
U. Eismann ◽  
M. M. Lahn ◽  
J. Fleeth ◽  
F. Lüdtke ◽  
...  

13046 Background: Enzastaurin (E) is an active antitumoral agent which selectively inhibits the β-isoform of protein kinase C (PKC-β). The compound blocks the enzyme’s ATP-binding site and signal transmission is abrogated resulting in the inhibition of neovascularization. The aim of the present study was to correlate gene expression with in vitro chemosensitivity of freshly explanted human tumor specimens. Such correlations in tumors taken directly from patients will help to rationally design subsequent clinical trials. Methods: Soft-agar colony forming assays were performed on freshly biopsied tumor cells exposed to various concentrations of E. Corresponding pieces of tumor specimens were shock-frozen and prepared for RNA isolation and cDNA generation followed by multiplex real-time PCR experiments. Gene expression data were correlated against cloning assay results. Results: Gene expression data of PKC-β1, PKC-β2, IL8RA, IL8RB, IL8, GSK3-β, and TGF-β were correlated against in vitro chemosensitivity pattern of E from 66 samples. After 1h-drug exposure gene expressions in sensitive versus resistant specimens were statistically significant with p = 0.013 for IL8 [median copy number (mcn): 1881 vs. 694; n = 66] and p = 0.012 for GSK3-beta (mcn: 1.6 vs. 7.0; n = 66). No correlation was detected for PKC-β1, PKC-β2, IL8RA, and IL8RB. Detection of TGF-β failed in most samples. Conclusions: Low expression of GSK3-β and high expression of IL8 correlate statistically significantly with increased in vitro sensitivity to E in freshly explanted human tumors. These findings may help direct further clinical development of this compound. No significant financial relationships to disclose.


Author(s):  
Miao Wang ◽  
Xuequn Shang ◽  
Shaohua Zhang ◽  
Zhanhuai Li

DNA microarray technology has generated a large number of gene expression data. Biclustering is a methodology allowing for condition set and gene set points clustering simultaneously. It finds clusters of genes possessing similar characteristics together with biological conditions creating these similarities. Almost all the current biclustering algorithms find bicluster in one microarray dataset. In order to reduce the noise influence and find more biological biclusters, the authors propose the FDCluster algorithm in order to mine frequent closed discriminative bicluster in multiple microarray datasets. FDCluster uses Apriori property and several novel techniques for pruning to mine biclusters efficiently. To increase the space usage, FDCluster also utilizes several techniques to generate frequent closed bicluster without candidate maintenance in memory. The experimental results show that FDCluster is more effective than traditional methods in either single micorarray dataset or multiple microarray datasets. This paper tests the biological significance using GO to show the proposed method is able to produce biologically relevant biclusters.


Author(s):  
Prangyaparamita Mohapatra ◽  
Tripti Swarnkar

DNA microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes during biological processes and across collections of related samples. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. Many conventional clustering algorithms have been adapted or directly applied to gene expression data, and also new algorithms have recently been proposed specifically aiming at gene expression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. A large number of clustering approaches have been proposed for the analysis of gene expression data obtained from microarray experiments. However, the results of the application of standard clustering methods to genes are limited. These limited results are imposed by the existence of a number of experimental conditions where the activity of genes is uncorrelated. A similar limitation exists when clustering of conditions is performed. For this reason, a number of algorithms that perform simultaneous clustering on the row and column dimensions of the gene expression matrix have been proposed to date. This simultaneous clustering, usually designated by biclustering, seeks to find submatrices that are subgroups of genes and subgroups of columns, where the genes exhibit highly correlated activities for every condition. This type of algorithms has also been proposed and used in other fields, such as information retrieval and data mining. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data. Then, we present specific challenges pertinent to each clustering category and introduce several representative approaches.


Sign in / Sign up

Export Citation Format

Share Document