scholarly journals Identification of a Gene Expression Signature Common to Distinct Cancer Pathways

2012 ◽  
Vol 11 ◽  
pp. CIN.S9542 ◽  
Author(s):  
Niklaus Fankhauser ◽  
Igor Cima ◽  
Peter Wild ◽  
Wilhelm Krek

Mutations in cancer-causing genes induce changes in gene expression programs critical for malignant cell transformation. Publicly available gene expression profiles produced by modulating the expression of distinct cancer genes may therefore represent a rich resource for the identification of gene signatures common to seemingly unrelated cancer genes. We combined automatic retrieval with manual validation to obtain a data set of high-quality gene microarray profiles. This data set was used to create logical models of the signaling events underlying the observed expression changes produced by various cancer genes and allowed to uncover unknown and verifiable interactions. Data clustering revealed novel sets of gene expression profiles commonly regulated by distinct cancer genes. Our method allows retrieval of significant new information and testable hypotheses from a pool of deposited cancer gene expression experiments that are otherwise not apparent or appear insignificant from single measurements. The complete results are available through a web-application at http://biodata.ethz.ch/cgi-bin/geologic .

2007 ◽  
Vol 32 (1) ◽  
pp. 154-159 ◽  
Author(s):  
Li Li ◽  
Amitabha Chaudhuri ◽  
John Chant ◽  
Zhijun Tang

We have devised a novel analysis approach, percentile analysis for differential gene expression (PADGE), for identifying genes differentially expressed between two groups of heterogeneous samples. PADGE was designed to compare expression profiles of sample subgroups at a series of percentile cutoffs and to examine the trend of relative expression between sample groups as expression level increases. Simulation studies showed that PADGE has more statistical power than t-statistics, cancer outlier profile analysis (COPA) (Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally S, Cao X, Tchinda J, Kuefer R, Lee C, Montie JE, Shah RB, Pienta KJ, Rubin MA, Chinnaiyan AM. Science 310: 644–648, 2005), and kurtosis (Teschendorff AE, Naderi A, Barbosa-Morais NL, Caldas C. Bioinformatics 22: 2269–2275, 2006). Application of PADGE to microarray data sets in tumor tissues demonstrated its utility in prioritizing cancer genes encoding potential therapeutic targets or diagnostic markers. A web application was developed for researchers to analyze a large gene expression data set from heterogeneous biological samples and identify differentially expressed genes between subsets of sample classes using PADGE and other available approaches. Availability: http://www.cgl.ucsf.edu/Research/genentech/padge/ .


2009 ◽  
Vol 2009 ◽  
pp. 1-7 ◽  
Author(s):  
Yvonne E. Pittelkow ◽  
Susan R. Wilson

Scientific advances are raising expectations that patient-tailored treatment will soon be available. The development of resulting clinical approaches needs to be based on well-designed experimental and observational procedures that provide data to which proper biostatistical analyses are applied. Gene expression microarray and related technology are rapidly evolving. It is providing extremely large gene expression profiles containing many thousands of measurements. Choosing a subset from these gene expression measurements to include in a gene expression signature is one of the many challenges needing to be met. Choice of this signature depends on many factors, including the selection of patients in the training set. So the reliability and reproducibility of the resultant prognostic gene signature needs to be evaluated, in such a way as to be relevant to the clinical setting. A relatively straightforward approach is based on cross validation, with separate selection of genes at each iteration to avoid selection bias. Within this approach we developed two different methods, one based on forward selection, the other on genes that were statistically significant in all training blocks of data. We demonstrate our approach to gene signature evaluation with a well-known breast cancer data set.


2017 ◽  
Vol 49 (7) ◽  
pp. 355-367 ◽  
Author(s):  
Mackenzie S. Newman ◽  
Tina Nguyen ◽  
Michael J. Watson ◽  
Robert W. Hull ◽  
Han-Gang Yu

How obesity or sex may affect the gene expression profiles of human cardiac hypertrophy is unknown. We hypothesized that body-mass index (BMI) and sex can affect gene expression profiles of cardiac hypertrophy. Human heart tissues were grouped according to sex (male, female), BMI (lean<25 kg/m2, obese>30 kg/m2), or left ventricular hypertrophy (LVH) and non-LVH nonfailed controls (NF). We identified 24 differentially expressed (DE) genes comparing female with male samples. In obese subgroup, there were 236 DE genes comparing LVH with NF; in lean subgroup, there were seven DE genes comparing LVH with NF. In female subgroup, we identified 1,320 significant genes comparing LVH with NF; in male subgroup, there were 1,383 significant genes comparing LVH with NF. There were seven significant genes comparing obese LVH with lean NF; comparing male obese LVH with male lean NF samples we found 106 significant genes; comparing female obese LVH with male lean NF, we found no significant genes. Using absolute value of log2 fold-change > 2 or extremely small P value (10−20) as a criterion, we identified nine significant genes (HBA1, HBB, HIST1H2AC, GSTT1, MYL7, NPPA, NPPB, PDK4, PLA2G2A) in LVH, also found in published data set for ischemic and dilated cardiomyopathy in heart failure. We identified a potential gene expression signature that distinguishes between patients with high BMI or between men and women with cardiac hypertrophy. Expression of established biomarkers atrial natriuretic peptide A (NPPA) and B (NPPB) were already significantly increased in hypertrophy compared with controls.


Author(s):  
Christopher E. Gillies ◽  
Xiaoli Gao ◽  
Nilesh V. Patel ◽  
Mohammad-Reza Siadat ◽  
George D. Wilson

Personalized medicine is customizing treatments to a patient’s genetic profile and has the potential to revolutionize medical practice. An important process used in personalized medicine is gene expression profiling. Analyzing gene expression profiles is difficult, because there are usually few patients and thousands of genes, leading to the curse of dimensionality. To combat this problem, researchers suggest using prior knowledge to enhance feature selection for supervised learning algorithms. The authors propose an enhancement to the LASSO, a shrinkage and selection technique that induces parameter sparsity by penalizing a model’s objective function. Their enhancement gives preference to the selection of genes that are involved in similar biological processes. The authors’ modified LASSO selects similar genes by penalizing interaction terms between genes. They devise a coordinate descent algorithm to minimize the corresponding objective function. To evaluate their method, the authors created simulation data where they compared their model to the standard LASSO model and an interaction LASSO model. The authors’ model outperformed both the standard and interaction LASSO models in terms of detecting important genes and gene interactions for a reasonable number of training samples. They also demonstrated the performance of their method on a real gene expression data set from lung cancer cell lines.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kota Fujisawa ◽  
Mamoru Shimo ◽  
Y.-H. Taguchi ◽  
Shinya Ikematsu ◽  
Ryota Miyata

AbstractCoronavirus disease 2019 (COVID-19) is raging worldwide. This potentially fatal infectious disease is caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). However, the complete mechanism of COVID-19 is not well understood. Therefore, we analyzed gene expression profiles of COVID-19 patients to identify disease-related genes through an innovative machine learning method that enables a data-driven strategy for gene selection from a data set with a small number of samples and many candidates. Principal-component-analysis-based unsupervised feature extraction (PCAUFE) was applied to the RNA expression profiles of 16 COVID-19 patients and 18 healthy control subjects. The results identified 123 genes as critical for COVID-19 progression from 60,683 candidate probes, including immune-related genes. The 123 genes were enriched in binding sites for transcription factors NFKB1 and RELA, which are involved in various biological phenomena such as immune response and cell survival: the primary mediator of canonical nuclear factor-kappa B (NF-κB) activity is the heterodimer RelA-p50. The genes were also enriched in histone modification H3K36me3, and they largely overlapped the target genes of NFKB1 and RELA. We found that the overlapping genes were downregulated in COVID-19 patients. These results suggest that canonical NF-κB activity was suppressed by H3K36me3 in COVID-19 patient blood.


2019 ◽  
Vol 20 (23) ◽  
pp. 6098 ◽  
Author(s):  
Amarinder Singh Thind ◽  
Kumar Parijat Tripathi ◽  
Mario Rosario Guarracino

The comparison of high throughput gene expression datasets obtained from different experimental conditions is a challenging task. It provides an opportunity to explore the cellular response to various biological events such as disease, environmental conditions, and drugs. There is a need for tools that allow the integration and analysis of such data. We developed the “RankerGUI pipeline”, a user-friendly web application for the biological community. It allows users to use various rank based statistical approaches for the comparison of full differential gene expression profiles between the same or different biological states obtained from different sources. The pipeline modules are an integration of various open-source packages, a few of which are modified for extended functionality. The main modules include rank rank hypergeometric overlap, enriched rank rank hypergeometric overlap and distance calculations. Additionally, preprocessing steps such as merging differential expression profiles of multiple independent studies can be added before running the main modules. Output plots show the strength, pattern, and trends among complete differential expression profiles. In this paper, we describe the various modules and functionalities of the developed pipeline. We also present a case study that demonstrates how the pipeline can be used for the comparison of differential expression profiles obtained from multiple platforms’ data of the Gene Expression Omnibus. Using these comparisons, we investigate gene expression patterns in kidney and lung cancers.


BMC Genomics ◽  
2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Kyu-Sang Lim ◽  
Qian Dong ◽  
Pamela Moll ◽  
Jana Vitkovska ◽  
Gregor Wiktorin ◽  
...  

Abstract Background Gene expression profiling in blood is a potential source of biomarkers to evaluate or predict phenotypic differences between pigs but is expensive and inefficient because of the high abundance of globin mRNA in porcine blood. These limitations can be overcome by the use of QuantSeq 3’mRNA sequencing (QuantSeq) combined with a method to deplete or block the processing of globin mRNA prior to or during library construction. Here, we validated the effectiveness of QuantSeq using a novel specific globin blocker (GB) that is included in the library preparation step of QuantSeq. Results In data set 1, four concentrations of the GB were applied to RNA samples from two pigs. The GB significantly reduced the proportion of globin reads compared to non-GB (NGB) samples (P = 0.005) and increased the number of detectable non-globin genes. The highest evaluated concentration (C1) of the GB resulted in the largest reduction of globin reads compared to the NGB (from 56.4 to 10.1%). The second highest concentration C2, which showed very similar globin depletion rates (12%) as C1 but a better correlation of the expression of non-globin genes between NGB and GB (r = 0.98), allowed the expression of an additional 1295 non-globin genes to be detected, although 40 genes that were detected in the NGB sample (at a low level) were not present in the GB library. Concentration C2 was applied in the rest of the study. In data set 2, the distribution of the percentage of globin reads for NGB (n = 184) and GB (n = 189) samples clearly showed the effects of the GB on reducing globin reads, in particular for HBB, similar to results from data set 1. Data set 3 (n = 84) revealed that the proportion of globin reads that remained in GB samples was significantly and positively correlated with the reticulocyte count in the original blood sample (P < 0.001). Conclusions The effect of the GB on reducing the proportion of globin reads in porcine blood QuantSeq was demonstrated in three data sets. In addition to increasing the efficiency of sequencing non-globin mRNA, the GB for QuantSeq has an advantage that it does not require an additional step prior to or during library creation. Therefore, the GB is a useful tool in the quantification of whole gene expression profiles in porcine blood.


Blood ◽  
2008 ◽  
Vol 112 (11) ◽  
pp. sci-51-sci-51
Author(s):  
Todd R. Golub

Genomics holds particular potential for the elucidation of biological networks that underlie disease. For example, gene expression profiles have been used to classify human cancers, and have more recently been used to predict graft rejection following organ transplantation. Such signatures thus hold promise both as diagnostic approaches and as tools with which to dissect biological mechanism. Such systems-based approaches are also beginning to impact the drug discovery process. For example, it is now feasible to measure gene expression signatures at low cost and high throughput, thereby allowing for the screening libraries of small molecule libraries in order to identify compounds capable of perturbing a signature of interest (even if the critical drivers of that signature are not yet known). This approach, known as Gene Expression-Based High Throughput Screening (GE-HTS), has been shown to identify candidate therapeutic approaches in AML, Ewing sarcoma, and neuroblastoma, and has identified tool compounds capable of inhibiting PDGF receptor signaling. A related approach, known as the Connectivity Map (www.broad.mit.edu/cmap) attempts to use gene expression profiles as a universal language with which to connect cellular states, gene product function, and drug action. In this manner, a gene expression signature of interest is used to computationally query a database of gene expression profiles of cells systematically treated with a large number of compounds (e.g., all off-patent FDA-approved drugs), thereby identifying potential new applications for existing drugs. Such systems level approaches thus seek chemical modulators of cellular states, even when the molecular basis of such altered states is unknown.


Blood ◽  
2011 ◽  
Vol 118 (21) ◽  
pp. 629-629
Author(s):  
Yiming Zhou ◽  
Qing Zhang ◽  
Christoph Heuck ◽  
Owen Stephens ◽  
Erming Tian ◽  
...  

Abstract Abstract 629 Background: Cytogenetic abnormalities (CA) are a hallmark of multiple myeloma (MM) and other cancers and are commonly used as clinical parameters for determining disease stage and guiding therapy decisions. Traditional techniques, including fluorescence in situ hybridization (FISH) and karyotyping, and the recently developed array-based comparative genomic hybridization are expensive and time consuming. As gene expression profiling (GEP) is becoming more integrated in the diagnostic workup of MM and is increasingly being used for risk stratification as well as tailoring therapy, we are presented with vast amounts of data that should reflect disease associated alterations of the genome. We therefore sought to develop a GEP based vitual CA (vCA) model to predict CA in MM. Methods/Results: We determined genome-wide gene expression profiles and DNA copy numbers (CNs) in purified plasma cell samples obtained from 92 newly diagnosed MM patients, using the Affymetrix GeneChip and the Agilent aCGH platforms, respectively. We identified 1,114 CN-sensitive genes by Pearson's correlation coefficient (PCC) of gene expression levels and the copy numbers of the corresponding DNA loci, keeping the false discovery rate to <5%. On the basis of these CN-sensitive genes, we developed a vCA model for predicting CA in MM patients by means of GEP. The model focuses particularly on chromosomes 3, 5, 7, 9, 11, 13, 15, 19, and 21, as well as the 1p, 1q, and 6q segments, which are the most commonly altered chromosome regions in MM plasma cells. The reference CA (rCA) of a given chromosome region were determined by the mean values of signals of aCGH probes located in that region. The values of rCA could be used to distinguish among amplification, deletion, and normal. The predicted CA (pCA) of a given chromosome region were determined by the following procedures. First, we calculated the mean expression levels of CN-sensitive genes within the region. Then, by training the model in a GEP data set with 92 MM samples, we set the cutoff value of the mean expression levels of CN-sensitive genes for each chromosome region in order to obtain pCA that were most consistent with rCA in terms of the Matthews correlation coefficient, a measure of the quality of binary (two-class) classifications. The mean prediction accuracy was 0.88 (0.59–0.99) when the model was applied to the training data set. To check for overfitting in the vCA model, we applied the model to an independent data set of 23 MM samples for which both GEP and aCGH data were available. The mean prediction accuracy was 0.89 (0.74–1.00), which indicated that overfitting was negligible if present at all. We further validated the model with a FISH data set compiled from 262 independent MM samples for which both FISH records and GEP data were available. The mean prediction accuracy was 0.87. The consistency between vCA-predicted chromosomal alterations and findings of karyotyping dropped to 0.65. However, this underperformance could be due to the fact that karyotyping is limited by the low proliferation rate of terminally differentiated plasma cells in vitro. Conclusion: Our results provide a proof of concept that GEP data alone can reveal all the information provided by conventional cytogenetic techniques. We show that re-purposing gene expression data using our model is a fast and economical way to obtain cytogenetic information that is accurate and can be used for diagnosis and observation in MM and potentially other malignancies. GEP can serve as a one-stop genomic data source for information from the level of specific genes to whole chromosomes. Disclosures: Barlogie: Celgene: Consultancy, Honoraria, Research Funding; IMF: Consultancy, Honoraria; MMRF: Consultancy; Millennium: Consultancy, Honoraria, Research Funding; Genzyme: Consultancy; Novartis: Research Funding; NCI: Research Funding; Johnson & Johnson: Research Funding; Centocor: Research Funding; Onyx: Research Funding; Icon: Research Funding. Shaughnessy:Myeloma Health, Celgene, Genzyme, Novartis: Consultancy, Employment, Equity Ownership, Honoraria, Patents & Royalties.


Sign in / Sign up

Export Citation Format

Share Document