scholarly journals Prediction of cancer mutation states using multiple data modalities reveals the utility and consistency of gene expression and DNA methylation

2021 ◽  
Author(s):  
Jake Crawford ◽  
Brock C Christensen ◽  
Maria Chikina ◽  
Casey S Greene

In studies of cellular function in cancer, researchers are increasingly able to choose from many -omics assays as functional readouts. Choosing the correct readout for a given study can be difficult, and which layer of cellular function is most suitable to capture the relevant signal may be unclear. In this study, we consider prediction of cancer mutation status (presence or absence) from functional -omics data as a representative problem. Since functional signatures of cancer mutation have been identified across many data types, this problem presents an opportunity to quantify and compare the ability of different -omics readouts to capture signals of dysregulation in cancer. The TCGA Pan-Cancer Atlas contains genetic alteration data including somatic mutations and copy number variants (CNVs), as well as several -omics data types. From TCGA, we focus on RNA sequencing, DNA methylation arrays, reverse phase protein arrays (RPPA), microRNA, and somatic mutational signatures as -omics readouts. Across a collection of cancer-associated genetic alterations, RNA sequencing and DNA methylation were the most effective predictors of alteration state. Surprisingly, we found that for most alterations, they were approximately equally effective predictors. The target gene was the primary driver of performance, rather than the data type, and there was little difference between the top data types for the majority of genes. We also found that combining data types into a single multi-omics model often provided little or no improvement in predictive ability over the best individual data type. Based on our results, for the design of studies focused on the functional outcomes of cancer mutations, we recommend focusing on gene expression or DNA methylation as first-line readouts.

Blood ◽  
2011 ◽  
Vol 118 (19) ◽  
pp. 5218-5226 ◽  
Author(s):  
Laura E. Hogan ◽  
Julia A. Meyer ◽  
Jun Yang ◽  
Jinhua Wang ◽  
Nicholas Wong ◽  
...  

Abstract Despite an increase in survival for children with acute lymphoblastic leukemia (ALL), the outcome after relapse is poor. To understand the genetic events that contribute to relapse and chemoresistance and identify novel targets of therapy, 3 high-throughput assays were used to identify genetic and epigenetic changes at relapse. Using matched diagnosis/relapse bone marrow samples from children with relapsed B-precursor ALL, we evaluated gene expression, copy number abnormalities (CNAs), and DNA methylation. Gene expression analysis revealed a signature of differentially expressed genes from diagnosis to relapse that is different for early (< 36 months) and late (≥ 36 months) relapse. CNA analysis discovered CNAs that were shared at diagnosis and relapse and others that were new lesions acquired at relapse. DNA methylation analysis found increased promoter methylation at relapse. There were many genetic alterations that evolved from diagnosis to relapse, and in some cases these genes had previously been associated with chemoresistance. Integration of the results from all 3 platforms identified genes of potential interest, including CDKN2A, COL6A2, PTPRO, and CSMD1. Although our results indicate that a diversity of genetic changes are seen at relapse, integration of gene expression, CNA, and methylation data suggest a possible convergence on the WNT and mitogen-activated protein kinase pathways.


2019 ◽  
Author(s):  
Aziz Al’Khafaji ◽  
Catherine Gutierrez ◽  
Eric Brenner ◽  
Russell Durrett ◽  
Kaitlyn E. Johnson ◽  
...  

AbstractThe remarkable evolutionary capacity of cancer is a major challenge to current therapeutic efforts. Fueling this evolution is its vast clonal heterogeneity and ability to adapt to diverse selective pressures. Although the genetic and transcriptional mechanisms underlying these responses have been independently evaluated, the ability to couple genetic alterations present within individual clones to their respective transcriptional or functional outputs has been lacking in the field. To this end, we developed a high-complexity expressed barcode library that integrates DNA barcoding with single-cell RNA sequencing through use of the CROP-seq sgRNA expression/capture system, and which is compatible with the COLBERT clonal isolation workflow for subsequent genomic and epigenomic characterization of specific clones of interest. We applied this approach to study chronic lymphocytic leukemia (CLL), a mature B cell malignancy notable for its genetic and transcriptomic heterogeneity and variable disease course. Here, we demonstrate the clonal composition and gene expression states of HG3, a CLL cell line harboring the common alteration del(13q), in response to front-line cytotoxic therapy of fludarabine and mafosfamide (an analog of the clinically used cyclophosphamide). Analysis of clonal abundance and clonally-resolved single-cell RNA sequencing revealed that only a small fraction of clones consistently survived therapy. These rare highly drug tolerant clones comprise 94% of the post-treatment population and share a stable, pre-existing gene expression state characterized by upregulation of CXCR4 and WNT signaling and a number of DNA damage and cell survival genes. Taken together, these data demonstrate at unprecedented resolution the diverse clonal characteristics and therapeutic responses of a heterogeneous cancer cell population. Further, this approach provides a template for the high-resolution study of thousands of clones and the respective gene expression states underlying their response to therapy.


2018 ◽  
Author(s):  
Koen Van Den Berge ◽  
Katharina Hembach ◽  
Charlotte Soneson ◽  
Simone Tiberi ◽  
Lieven Clement ◽  
...  

Gene expression is the fundamental level at which the result of various genetic and regulatory programs are observable. The measurement of transcriptome-wide gene expression has convincingly switched from microarrays to sequencing in a matter of years. RNA sequencing (RNA-seq) provides a quantitative and open system for profiling transcriptional outcomes on a large scale and therefore facilitates a large diversity of applications, including basic science studies, but also agricultural or clinical situations. In the past 10 years or so, much has been learned about the characteristics of the RNA-seq datasets as well as the performance of the myriad of methods developed. In this review, we give an overall view of the developments in RNA-seq data analysis, including experimental design, with an explicit focus on quantification of gene expression and statistical approaches for differential expression. We also highlight emerging data types, such as single-cell RNA-seq and gene expression profiling using long-read technologies.


F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 1444
Author(s):  
Charity W. Law ◽  
Kathleen Zeglinski ◽  
Xueyi Dong ◽  
Monther Alhamdoosh ◽  
Gordon K. Smyth ◽  
...  

Differential expression analysis of genomic data types, such as RNA-sequencing experiments, use linear models to determine the size and direction of the changes in gene expression. For RNA-sequencing, there are several established software packages for this purpose accompanied with analysis pipelines that are well described. However, there are two crucial steps in the analysis process that can be a stumbling block for many -- the set up an appropriate model via design matrices and the set up of comparisons of interest via contrast matrices. These steps are particularly troublesome because an extensive catalogue for design and contrast matrices does not currently exist. One would usually search for example case studies across different platforms and mix and match the advice from those sources to suit the dataset they have at hand. This article guides the reader through the basics of how to set up design and contrast matrices. We take a practical approach by providing code and graphical representation of each case study, starting with simpler examples (e.g. models with a single explanatory variable) and move onto more complex ones (e.g. interaction models, mixed effects models, higher order time series and cyclical models). Although our work has been written specifically with a limma-style pipeline in mind, most of it is also applicable to other software packages for differential expression analysis, and the ideas covered can be adapted to data analysis of other high-throughput technologies. Where appropriate, we explain the interpretation and differences between models to aid readers in their own model choices. Unnecessary jargon and theory is omitted where possible so that our work is accessible to a wide audience of readers, from beginners to those with experience in genomics data analysis.


2020 ◽  
pp. 210-220 ◽  
Author(s):  
Hayley M. Dingerdissen ◽  
Frederic Bastian ◽  
K. Vijay-Shanker ◽  
Marc Robinson-Rechavi ◽  
Amanda Bell ◽  
...  

PURPOSE The purpose of OncoMX 1 knowledgebase development was to integrate cancer biomarker and relevant data types into a meta-portal, enabling the research of cancer biomarkers side by side with other pertinent multidimensional data types. METHODS Cancer mutation, cancer differential expression, cancer expression specificity, healthy gene expression from human and mouse, literature mining for cancer mutation and cancer expression, and biomarker data were integrated, unified by relevant biomedical ontologies, and subjected to rule-based automated quality control before ingestion into the database. RESULTS OncoMX provides integrated data encompassing more than 1,000 unique biomarker entries (939 from the Early Detection Research Network [EDRN] and 96 from the US Food and Drug Administration) mapped to 20,576 genes that have either mutation or differential expression in cancer. Sentences reporting mutation or differential expression in cancer were extracted from more than 40,000 publications, and healthy gene expression data with samples mapped to organs are available for both human genes and their mouse orthologs. CONCLUSION OncoMX has prioritized user feedback as a means of guiding development priorities. By mapping to and integrating data from several cancer genomics resources, it is hoped that OncoMX will foster a dynamic engagement between bioinformaticians and cancer biomarker researchers. This engagement should culminate in a community resource that substantially improves the ability and efficiency of exploring cancer biomarker data and related multidimensional data.


Author(s):  
Tianzhong Yang ◽  
Peng Wei ◽  
Wei Pan

Abstract Motivation The abundance of omics data has facilitated integrative analyses of single and multiple molecular layers with genome-wide association studies focusing on common variants. Built on its successes, we propose a general analysis framework to leverage multi-omics data with sequencing data to improve the statistical power of discovering new associations and understanding of the disease susceptibility due to low-frequency variants. The proposed test features its robustness to model misspecification, high power across a wide range of scenarios and the potential of offering insights into the underlying genetic architecture and disease mechanisms. Results Using the Framingham Heart Study data, we show that low-frequency variants are predictive of DNA methylation, even after conditioning on the nearby common variants. In addition, DNA methylation and gene expression provide complementary information to functional genomics. In the Avon Longitudinal Study of Parents and Children with a sample size of 1497, one gene CLPTM1 is identified to be associated with low-density lipoprotein cholesterol levels by the proposed powerful adaptive gene-based test integrating information from gene expression, methylation and enhancer–promoter interactions. It is further replicated in the TwinsUK study with 1706 samples. The signal is driven by both low-frequency and common variants. Availability and implementation Models are available at https://github.com/ytzhong/DNAm. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 9 (1) ◽  
Author(s):  
Emily L. Flam ◽  
Ludmila Danilova ◽  
Dylan Z. Kelley ◽  
Elena Stavrovskaya ◽  
Theresa Guo ◽  
...  

Abstract Current literature suggests that epigenetically regulated super-enhancers (SEs) are drivers of aberrant gene expression in cancers. Many tumor types are still missing chromatin data to define cancer-specific SEs and their role in carcinogenesis. In this work, we develop a simple pipeline, which can utilize chromatin data from etiologically similar tumors to discover tissue-specific SEs and their target genes using gene expression and DNA methylation data. As an example, we applied our pipeline to human papillomavirus-related oropharyngeal squamous cell carcinoma (HPV + OPSCC). This tumor type is characterized by abundant gene expression changes, which cannot be explained by genetic alterations alone. Chromatin data are still limited for this disease, so we used 3627 SE elements from public domain data for closely related tissues, including normal and tumor lung, and cervical cancer cell lines. We integrated the available DNA methylation and gene expression data for HPV + OPSCC samples to filter the candidate SEs to identify functional SEs and their affected targets, which are essential for cancer development. Overall, we found 159 differentially methylated SEs, including 87 SEs that actively regulate expression of 150 nearby genes (211 SE-gene pairs) in HPV + OPSCC. Of these, 132 SE-gene pairs were validated in a related TCGA cohort. Pathway analysis revealed that the SE-regulated genes were associated with pathways known to regulate nasopharyngeal, breast, melanoma, and bladder carcinogenesis and are regulated by the epigenetic landscape in those cancers. Thus, we propose that gene expression in HPV + OPSCC may be controlled by epigenetic alterations in SE elements, which are common between related tissues. Our pipeline can utilize a diversity of data inputs and can be further adapted to SE analysis of diseased and non-diseased tissues from different organisms.


2014 ◽  
Vol 2014 ◽  
pp. 1-9 ◽  
Author(s):  
Cenny Taslim ◽  
Shili Lin

The inventions of microarray and next generation sequencing technologies have revolutionized research in genomics; platforms have led to massive amount of data in gene expression, methylation, and protein-DNA interactions. A common theme among a number of biological problems using high-throughput technologies is differential analysis. Despite the common theme, different data types have their own unique features, creating a “moving target” scenario. As such, methods specifically designed for one data type may not lead to satisfactory results when applied to another data type. To meet this challenge so that not only currently existing data types but also data from future problems, platforms, or experiments can be analyzed, we propose a mixture modeling framework that is flexible enough to automatically adapt to any moving target. More specifically, the approach considers several classes of mixture models and essentially provides a model-based procedure whose model is adaptive to the particular data being analyzed. We demonstrate the utility of the methodology by applying it to three types of real data: gene expression, methylation, and ChIP-seq. We also carried out simulations to gauge the performance and showed that the approach can be more efficient than any individual model without inflating type I error.


2006 ◽  
Vol 84 (4) ◽  
pp. 463-466 ◽  
Author(s):  
Ana C. D’Alessio ◽  
Moshe Szyf

The epigenome, which comprises chromatin, associated proteins, and the pattern of covalent modification of DNA by methylation, sets up and maintains gene expression programs. It was originally believed that DNA methylation was the dominant reaction in determining the chromatin structure. However, emerging data suggest that chromatin can affect DNA methylation in both directions, triggering either de novo DNA methylation or demethylation. These events are particularly important for the understanding of cellular transformation, which requires a coordinated change in gene expression profiles. While genetic alterations can explain some of the changes, the important role of epigenetic reprogramming is becoming more and more evident. Cancer cells exhibit a paradoxical coexistence of global loss of DNA methylation with regional hypermethylation.


Sign in / Sign up

Export Citation Format

Share Document