STATegra: Multi-omics data integration - A conceptual scheme and a bioinformatics pipeline

AbstractTechnologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. It is therefore an unmet need to conceptualize how to integrate such data and to implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining machine learning component analysis, non-parametric data combination and a multi-omics exploratory analysis in a step-wise manner. While in several studies we have previously combined those integrative tools, here we provide a systematic description of the STATegra framework and its validation using two TCGA case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma cases, we demonstrate an enhanced capacity to identify features in comparison to single-omics analysis. Such an integrative multi-omics analysis framework for the identification of features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled, and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package https://bioconductor.org/packages/release/bioc/html/STATegra.html.

Download Full-text

STATegra: Multi-Omics Data Integration – A Conceptual Scheme With a Bioinformatics Pipeline

Frontiers in Genetics ◽

10.3389/fgene.2021.620453 ◽

2021 ◽

Vol 12 ◽

Author(s):

Nuria Planell ◽

Vincenzo Lagani ◽

Patricia Sebastian-Leon ◽

Frans van der Kloet ◽

Ewoud Ewing ◽

...

Keyword(s):

Large Scale ◽

Unmet Need ◽

Genome Project ◽

The Cancer Genome Atlas ◽

Omics Data ◽

Bioinformatics Pipeline ◽

Omics Analysis ◽

Parametric Data ◽

The Individual ◽

The Human Genome Project

Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.1

Download Full-text

Recent advances in functional genome analysis

F1000Research ◽

10.12688/f1000research.15274.1 ◽

2018 ◽

Vol 7 ◽

pp. 1968 ◽

Cited By ~ 3

Author(s):

Roderic Guigo ◽

Michiel de Hoon

Keyword(s):

Functional Genomics ◽

Human Genome ◽

Large Scale ◽

Massively Parallel Sequencing ◽

Genome Project ◽

Biological Traits ◽

Cellular Behavior ◽

Sequencing Technologies ◽

Recent Advances ◽

The Human Genome Project

At the beginning of this century, the Human Genome Project produced the first drafts of the human genome sequence. Following this, large-scale functional genomics studies were initiated to understand the molecular basis underlying the translation of the instructions encoded in the genome into the biological traits of organisms. Instrumental in the ensuing revolution in functional genomics were the rapid advances in massively parallel sequencing technologies as well as the development of a wide diversity of protocols that make use of these technologies to understand cellular behavior at the molecular level. Here, we review recent advances in functional genomic methods, discuss some of their current capabilities and limitations, and briefly sketch future directions within the field.

Download Full-text

Integrating –omics data into genome-scale metabolic network models: principles and challenges

Essays in Biochemistry ◽

10.1042/ebc20180011 ◽

2018 ◽

Vol 62 (4) ◽

pp. 563-574 ◽

Cited By ~ 10

Author(s):

Charlotte Ramon ◽

Mattia G. Gollub ◽

Jörg Stelling

Keyword(s):

Data Integration ◽

Large Scale ◽

Network Models ◽

Omics Data ◽

Scale Models ◽

Common Framework ◽

Genome Scale ◽

Constraint Based Models ◽

Omics Data Integration

At genome scale, it is not yet possible to devise detailed kinetic models for metabolism because data on the in vivo biochemistry are too sparse. Predictive large-scale models for metabolism most commonly use the constraint-based framework, in which network structures constrain possible metabolic phenotypes at steady state. However, these models commonly leave many possibilities open, making them less predictive than desired. With increasingly available –omics data, it is appealing to increase the predictive power of constraint-based models (CBMs) through data integration. Many corresponding methods have been developed, but data integration is still a challenge and existing methods perform less well than expected. Here, we review main approaches for the integration of different types of –omics data into CBMs focussing on the methods’ assumptions and limitations. We argue that key assumptions – often derived from single-enzyme kinetics – do not generally apply in the context of networks, thereby explaining current limitations. Emerging methods bridging CBMs and biochemical kinetics may allow for –omics data integration in a common framework to provide more accurate predictions.

Download Full-text

Network- and Enrichment-based Inference of Phenotypes and Targets from large-scale Disease Maps

10.1101/2021.09.13.460023 ◽

2021 ◽

Author(s):

Matti Hoch ◽

Suchi Smita Gupta ◽

Konstantin Cesnulevicius ◽

David Lescheid ◽

Myron Schultz ◽

...

Keyword(s):

Regulatory Networks ◽

Large Scale ◽

Knowledge Bases ◽

Easy Access ◽

Omics Data ◽

Rna Seq ◽

Web Based ◽

Inflammation Resolution ◽

Disease Specific ◽

Omics Data Integration

Disease maps have emerged as computational knowledge bases for exploring and modeling disease-specific molecular processes. By capturing molecular interactions, disease-associated processes, and phenotypes in standardized representations, disease maps provide a platform for applying bioinformatics and systems biology approaches. Applications range from simple map exploration to algorithm-driven target discovery and network perturbation. The web-based MINERVA environment for disease maps provides a platform to develop tools not only for mapping experimental data but also to identify, analyze and simulate disease-specific regulatory networks. We have developed a MINERVA plugin suite based on network topology and enrichment analyses that facilitate multi-omics data integration and enable in silico perturbation experiments on disease maps. We demonstrate workflows by analyzing two RNA-seq datasets on the Atlas of Inflammation Resolution (AIR). Our approach improves usability and increases the functionality of disease maps by providing easy access to available data and integration of self-generated data. It supports efficient and intuitive analysis of omics data, with a focus on disease maps.

Download Full-text

Network integration of data and analysis of oncology interest

Journal of Integrative Bioinformatics ◽

10.1515/jib-2006-21 ◽

2006 ◽

Vol 3 (1) ◽

pp. 45-55

Author(s):

P. Romano ◽

G. Bertolini ◽

F. De Paoli ◽

M. Fattore ◽

D. Marra ◽

...

Keyword(s):

Web Services ◽

Large Scale ◽

New Technologies ◽

Workflow Management ◽

Genome Project ◽

Management Systems ◽

Workflow Management Systems ◽

The Creation ◽

The Human Genome Project ◽

Integrate Data

Summary The Human Genome Project has deeply transformed biology and the field has since then expanded to the management, processing, analysis and visualization of large quantities of data from genomics, proteomics, medicinal chemistry and drug screening. This huge amount of data and the heterogeneity of software tools that are used implies the adoption on a very large scale of new, flexible tools that can enable researchers to integrate data and analysis on the network. ICT technology standards and tools, like Web Services and related languages, and workflow management systems, can support the creation and deployment of such systems. While a number of Web Services are appearing and personal workflow management systems are also being more and more offered to researchers, a reference portal enabling the vast majority of unskilled researchers to take profit from these new technologies is still lacking. In this paper, we introduce the rationale for the creation of such a portal and present the architecture and some preliminary results for the development of a portal for the enactment of workflows of interest in oncology.

Download Full-text

An Overview of Ethics and Public Health Genetics

The Oxford Handbook of Public Health Ethics ◽

10.1093/oxfordhb/9780190245191.013.55 ◽

2019 ◽

pp. 633-641

Author(s):

Debra J. H. Mathews

Keyword(s):

Public Health ◽

Human Genome ◽

Large Scale ◽

Genome Project ◽

Community Genetics ◽

Public Health Genetics ◽

The West ◽

Genetics And Genomics ◽

The 1960S ◽

The Human Genome Project

Public health genetics (more commonly referred to as “community genetics” in Europe) has been practiced to some degree in the West since at least the 1960s, but the development of a cohesive field took time and advances in technology. The application of genetics and genomics to prevent disease and promote public health became firmly established as a field in the late 1990s, as large-scale sequencing of the human genome as part of the Human Genome Project began. The field is now thriving, leading to both tremendous public health benefits and risks for both individuals and populations. This chapter provides an overview of the section of The Oxford Handbook of Public Health Ethics dedicated to public health genetics. The chapters roughly trace the evolution of public health genetics from its roots in eugenics, to the present challenges faced in newborn screening and biobanking, and finally to emerging questions raised by the application of genomics to infectious disease.

Download Full-text

The Human Genome Project: Lessons from Large-Scale Biology

Science ◽

10.1126/science.1084564 ◽

2003 ◽

Vol 300 (5617) ◽

pp. 286-290 ◽

Cited By ~ 562

Author(s):

F. S. Collins

Keyword(s):

Human Genome ◽

Human Genome Project ◽

Large Scale ◽

Genome Project ◽

The Human Genome Project

Download Full-text

Emerging Biosecurity Threats and Responses: A Review of Published and Gray Literature

10.1007/978-94-024-2086-9_2 ◽

2021 ◽

pp. 13-36

Author(s):

Christopher L. Cummings ◽

Kaitlin M. Volk ◽

Anna A. Ulanova ◽

Do Thuy Uyen Ha Lam ◽

Pei Rou Ng

Keyword(s):

World War Ii ◽

Human Genome ◽

Large Scale ◽

Chemical Engineering ◽

Human Genetics ◽

Genome Project ◽

Agricultural Crop ◽

Crop Selection ◽

Research Initiatives ◽

The Human Genome Project

AbstractThe field of biotechnology has been rigorously researched and applied to many facets of everyday life. Biotechnology is defined as the process of modifying an organism or a biological system for an intended purpose. Biotechnology applications range from agricultural crop selection to pharmaceutical and genetic processes (Bauer and Gaskell 2002). The definition, however, is evolving with recent scientific advancements. Until World War II, biotechnology was primarily siloed in agricultural biology and chemical engineering. The results of this era included disease-resistant crops, pesticides, and other pest-controlling tools (Verma et al. 2011). After WWII, biotechnology began to shift domains when advanced research on human genetics and DNA started. In 1984, the Human Genome Project (HGP) was formerly proposed, which initiated the pursuit to decode the human genome by the private and academic sectors. The legacy of the project gave rise to ancillary advancements in data sharing and open-source software, and solidified the prominence of “big science;” solidifying capital-intensive large-scale private-public research initiatives that were once primarily under the purview of government-funded programs (Hood and Rowen 2013). After the HGP, the biotechnology industry boomed as a result of dramatic cost reduction to DNA sequencing processes. In 2019 the industry was globally estimated to be worth $449.06 billion and is projected to increase in value (Polaris 2020).

Download Full-text

Gene trap strategies in ES cells

Gene Targeting ◽

10.1093/oso/9780199637928.003.0010 ◽

1999 ◽

Author(s):

Wolfgang Wurst ◽

Achim Gossler

Keyword(s):

Human Genome ◽

Reporter Gene ◽

Human Genome Project ◽

Large Scale ◽

Es Cells ◽

Gene Trap ◽

Genome Project ◽

Endogenous Gene ◽

The Human Genome Project

Gene trap (GT) strategies in mouse embryonic stem (ES) cells are increasingly being used for detecting patterns of gene expression (1-4, isolating and mutating endogenous genes (5-7), and identifying targets of signalling molecules and transcription factors (3, 8-10). The general term gene trap refers to the random integration of a reporter gene construct (called entrapment vector) (11, 12) into the genome such that ‘productive’ integration events bring the reporter gene under the transcriptional regulation of an endogenous gene. In some cases this also simultaneously generates an insertional mutation. Entrapment vectors were originally developed in bacteria (13), and applied in Drosophila to identify novel developmental genes and/or regulatory sequences (14-17). Subsequently, a modified strategy was developed for mouse in which the reporter gene mRNA becomes fused to an endogenous transcript. Such ‘gene trap’ vectors were initially used primarily as a tool to discover genes involved in development (1, 2,18). In the last five years there has been a significant shift of GT approaches in mouse to much broader, large scale applications in the context of the analysis of mammalian genomes and ‘functional genomics’. Sequencing and physical mapping of both the human and mouse genomes is expected to be completed within the next five years. Already, a large number of mouse and human genes have been identified as expressed sequence tags (ESTs), and very likely the majority of genes will be discovered as ESTs shortly. This vast sequence information contrasts with a rather limited understanding of the in vivo functions of these genes. Whereas DNA sequence can provide some indication of the potential functions of these genes and their products, their physiological roles in the organism have to be determined by mutational analysis. Thus, the sequencing effort of the human genome project has to be complemented by efficient functional analyses of the identified genes. One potentially powerful complementation to the efforts of the human genome project would be a strategy whereby large scale random mutagenesis in mouse is combined with the rapid identification of the mutated genes (6,7,19, and German gene trap consortium, W. W. unpublished data).

Download Full-text

Genomic Medicine

Einstein Journal of Biology and Medicine ◽

10.23861/ejbm20072358 ◽

2016 ◽

Vol 23 (1) ◽

pp. 21

Author(s):

Kremema Star ◽

Barbara Birshtein

Keyword(s):

Human Genome ◽

Human Disease ◽

Human Genome Project ◽

Large Scale ◽

New Technologies ◽

Genomic Medicine ◽

Genetic Material ◽

Genome Project ◽

The Human Genome Project

The human genome project created the field of genomics – understanding genetic material on a large scale. Scientists are deciphering the information held within the sequence of our genome. By building upon this knowledge, physicians and scientists will create fundamental new technologies to understand the contribution of genetics to diagnosis, prognosis, monitoring, and treatment of human disease. The science of genomic medicine has only begun to affect our understanding of health.

Download Full-text