scholarly journals Integrative, multi-omics, analysis of blood samples improves model predictions: applications to cancer

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Erica Ponzi ◽  
Magne Thoresen ◽  
Therese Haugdahl Nøst ◽  
Kajsa Møllersen

Abstract Background Cancer genomic studies often include data collected from several omics platforms. Each omics data source contributes to the understanding of the underlying biological process via source specific (“individual”) patterns of variability. At the same time, statistical associations and potential interactions among the different data sources can reveal signals from common biological processes that might not be identified by single source analyses. These common patterns of variability are referred to as “shared” or “joint”. In this work, we show how the use of joint and individual components can lead to better predictive models, and to a deeper understanding of the biological process at hand. We identify joint and individual contributions of DNA methylation, miRNA and mRNA expression collected from blood samples in a lung cancer case–control study nested within the Norwegian Women and Cancer (NOWAC) cohort study, and we use such components to build prediction models for case–control and metastatic status. To assess the quality of predictions, we compare models based on simultaneous, integrative analysis of multi-source omics data to a standard non-integrative analysis of each single omics dataset, and to penalized regression models. Additionally, we apply the proposed approach to a breast cancer dataset from The Cancer Genome Atlas. Results Our results show how an integrative analysis that preserves both components of variation is more appropriate than standard multi-omics analyses that are not based on such a distinction. Both joint and individual components are shown to contribute to a better quality of model predictions, and facilitate the interpretation of the underlying biological processes in lung cancer development. Conclusions In the presence of multiple omics data sources, we recommend the use of data integration techniques that preserve the joint and individual components across the omics sources. We show how the inclusion of such components increases the quality of model predictions of clinical outcomes.

2020 ◽  
Author(s):  
Erica Ponzi ◽  
Magne Thoresen ◽  
Therese Haugdahl Nøst ◽  
Kajsa Møllersen

AbstractBackgroundCancer genomic studies often include data collected from several omics platforms. Each omics data source contributes to the understanding of the underlying biological process via source specific (“individual”) patterns of variability. At the same time, statistical associations and potential interactions among the different data sources can reveal signals from common biological processes that might not be identified by single source analyses. These common patterns of variability are referred to as “shared” or “joint”. To capture both contributions of variance, integrative dimension reduction techniques are needed. Integrated PCA is a model based generalization of principal components analysis that separates shared and source specific variance by iteratively estimating covariance structures from a matrix normal distribution. Angle based JIVE is a matrix factorization method that decomposes joint and individual variation by permutation of row subspaces. We apply these techniques to identify joint and individual contributions of DNA methylation, miRNA and mRNA expression collected from blood samples in a lung cancer case control study nested within the Norwegian Woman and Cancer (NOWAC) cohort study.ResultsIn this work, we show how an integrative analysis that preserves both components of variation is more appropriate than analyses considering uniquely individual or joint components. Our results show how both joint and individual components contribute to a better quality of model predictions, and facilitate the interpretation of the underlying biological processes.ConclusionWhen compared to a non integrative analysis of the three omics sources, integrative models that simultaneously include joint and individual components result in better prediction of cancer status and metastatic cancer at diagnosis.


2020 ◽  
Author(s):  
Erica Ponzi ◽  
Magne Thoresen ◽  
Therese Haugdahl Nøst ◽  
Kajsa Møllersen

Abstract Background: Cancer genomic studies often include data collected from several omics platforms. Each omics data source contributes to the understanding of the underlying biological process via source specific (”individual”) patterns of variability. At the same time, statistical associations and potential interactions among the different data sources can reveal signals from common biological processes that might not be identified by single source analyses. These common patterns of variability are referred to as ”shared” or ”joint”. To capture both contributions of variance, integrative dimension reduction techniques are needed. Integrated PCA is a model based generalization of principal components analysis that separates shared and source specific variance by iteratively estimating covariance structures from a matrix normal distribution. Angle based JIVE is a matrix factorization method that decomposes joint and individual variation by permutation of row subspaces. We apply these techniques to identify joint and individual contributions of DNA methylation, miRNA and mRNA expression collected from blood samples in a lung cancer case control study nested within the Norwegian Woman and Cancer (NOWAC) cohort study.Results: In this work, we show how an integrative analysis that preserves both components of variation is more appropriate than analyses considering uniquely individual or joint components. Our results show how both joint and individual components contribute to a better quality of model predictions, and facilitate the interpretation of the underlying biological processes.Conclusions: When compared to a non integrative analysis of the three omics sources, integrative models that simultaneously include joint and individual components result in better prediction of cancer status and metastatic cancer at diagnosis.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Therese H. Nøst ◽  
Marit Holden ◽  
Tom Dønnem ◽  
Hege Bøvelstad ◽  
Charlotta Rylander ◽  
...  

AbstractRecent studies have indicated that there are functional genomic signals that can be detected in blood years before cancer diagnosis. This study aimed to assess gene expression in prospective blood samples from the Norwegian Women and Cancer cohort focusing on time to lung cancer diagnosis and metastatic cancer using a nested case–control design. We employed several approaches to statistically analyze the data and the methods indicated that the case–control differences were subtle but most distinguishable in metastatic case–control pairs in the period 0–3 years prior to diagnosis. The genes of interest along with estimated blood cell populations could indicate disruption of immunological processes in blood. The genes identified from approaches focusing on alterations with time to diagnosis were distinct from those focusing on the case–control differences. Our results support that explorative analyses of prospective blood samples could indicate circulating signals of disease-related processes.


2008 ◽  
Vol 105 (46) ◽  
pp. 17700-17705 ◽  
Author(s):  
Richard Llewellyn ◽  
David S. Eisenberg

As genome sequencing outstrips the rate of high-quality, low-throughput biochemical and genetic experimentation, accurate annotation of protein function becomes a bottleneck in the progress of the biomolecular sciences. Most gene products are now annotated by homology, in which an experimentally determined function is applied to a similar sequence. This procedure becomes error-prone between more divergent sequences and can contaminate biomolecular databases. Here, we propose a computational method of assignment of function, termed Generalized Functional Linkages (GFL), that combines nonhomology-based methods with other types of data. Functional linkages describe pairwise relationships between proteins that work together to perform a biological task. GFL provides a Bayesian framework that improves annotation by arbitrating a competition among biological process annotations to best describe the target protein. GFL addresses the unequal strengths of functional linkages among proteins, the quality of existing annotations, and the similarity among them while incorporating available knowledge about the cellular location or individual molecular function of the target protein. We demonstrate GFL with functional linkages defined by an algorithm known as zorch that quantifies connectivity in protein–protein interaction networks. Even when using proteins linked only by indirect or high-throughput interactions, GFL predicts the biological processes of many proteins in Saccharomyces cerevisiae, improving the accuracy of annotation by 20% over majority voting.


2017 ◽  
Author(s):  
Tycho Bismeijer ◽  
Sander Canisius ◽  
Lodewyk Wessels

AbstractEffective cancer treatment is crucially dependent on the identification of the biological processes that drive a tumor. However, multiple processes may be active simultaneously in a tumor. Clustering is inherently unsuitable to this task as it assigns a tumor to a single cluster. In addition, the wide availability of multiple data types per tumor provides the opportunity to profile the processes driving a tumor more comprehensively.Here we introduce Functional Sparse-Factor Analysis (funcSFA) to address these challenges. FuncSFA integrates multiple data types to define a lower dimensional space capturing the relevant variation. A tailor-made module associates biological processes with these factors. FuncSFA is inspired by iCluster, which we improve in several key aspects. First, we increase the convergence efficiency significantly, allowing the analysis of multiple molecular datasets that have not been pre-matched to contain only concordant features. Second, FuncSFA does not assign tumors to discrete clusters, but identifies the dominant driver processes active in each tumor. This is achieved by a regression of the factors on the RNA expression data followed by a functional enrichment analysis and manual curation step.We apply FuncSFA to the TCGA breast and lung datasets. We identify EMT and Immune processes common to both cancer types. In the breast cancer dataset we recover the known intrinsic subtypes and identify additional processes. These include immune infiltration and EMT, and processes driven by copy number gains on the 8q chromosome arm. In lung cancer we recover the major types (adenocarcinoma and squamous cell carcinoma) and processes active in both of these types. These include EMT, two immune processes, and the activity of the NFE2L2 transcription factor.In summary, FuncSFA is a robust method to perform discovery of key driver processes in a collection of tumors through unsupervised integration of multiple molecular data types and functional annotation.Author SummaryIn order to select effective cancer treatment, we need to determine which biological processes are active in a tumor. To this end, tumors have been quantified by high dimensional molecular measurements such as RNA sequencing and DNA copy number profiling. In order to support decision making, these measurements need to be condensed into interpretable summaries. Such summaries can be made interpretable by connecting them to biological processes.Biological process activity is continuous and multiple biological processes are taking place in a single tumor. Therefore, the biological processes associated with a tumor are misrepresented by clustering, which tries to put every tumor in a single cluster. In the method introduced in this paper (funcSFA), molecular measurements are summarized into a small number factors. A factor is a continuous value per tumor that aims to represent the activity of a biological process.When applied to breast and lung cancer, funcSFA identifies factors covering well known biology of these tumor types. FuncSFA also finds novel factors covering biology whose importance is not yet widely recognized in these tumor types. Some of the factors suggest treatment opportunities that can be further investigated in cell lines and mice.


2016 ◽  
Vol 1 (13) ◽  
pp. 162-168
Author(s):  
Pippa Hales ◽  
Corinne Mossey-Gaston

Lung cancer is one of the most commonly diagnosed cancers across Northern America and Europe. Treatment options offered are dependent on the type of cancer, the location of the tumor, the staging, and the overall health of the person. When surgery for lung cancer is offered, difficulty swallowing is a potential complication that can have several influencing factors. Surgical interaction with the recurrent laryngeal nerve (RLN) can lead to unilateral vocal cord palsy, altering swallow function and safety. Understanding whether the RLN has been preserved, damaged, or sacrificed is integral to understanding the effect on the swallow and the subsequent treatment options available. There is also the risk of post-surgical reduction of physiological reserve, which can reduce the strength and function of the swallow in addition to any surgery specific complications. As lung cancer has a limited prognosis, the clinician must also factor in the palliative phase, as this can further increase the burden of an already compromised swallow. By understanding the surgery and the implications this may have for the swallow, there is the potential to reduce the impact of post-surgical complications and so improve quality of life (QOL) for people with lung cancer.


Sign in / Sign up

Export Citation Format

Share Document