scholarly journals STATegra: Multi-omics data integration - A conceptual scheme and a bioinformatics pipeline

2020 ◽  
Author(s):  
Nuria Planell ◽  
Vincenzo Lagani ◽  
Patricia Sebastian-Leon ◽  
Frans van der Kloet ◽  
Ewoud Ewing ◽  
...  

AbstractTechnologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. It is therefore an unmet need to conceptualize how to integrate such data and to implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining machine learning component analysis, non-parametric data combination and a multi-omics exploratory analysis in a step-wise manner. While in several studies we have previously combined those integrative tools, here we provide a systematic description of the STATegra framework and its validation using two TCGA case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma cases, we demonstrate an enhanced capacity to identify features in comparison to single-omics analysis. Such an integrative multi-omics analysis framework for the identification of features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled, and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package https://bioconductor.org/packages/release/bioc/html/STATegra.html.

2021 ◽  
Vol 12 ◽  
Author(s):  
Nuria Planell ◽  
Vincenzo Lagani ◽  
Patricia Sebastian-Leon ◽  
Frans van der Kloet ◽  
Ewoud Ewing ◽  
...  

Technologies for profiling samples using different omics platforms have been at the forefront since the human genome project. Large-scale multi-omics data hold the promise of deciphering different regulatory layers. Yet, while there is a myriad of bioinformatics tools, each multi-omics analysis appears to start from scratch with an arbitrary decision over which tools to use and how to combine them. Therefore, it is an unmet need to conceptualize how to integrate such data and implement and validate pipelines in different cases. We have designed a conceptual framework (STATegra), aiming it to be as generic as possible for multi-omics analysis, combining available multi-omic anlaysis tools (machine learning component analysis, non-parametric data combination, and a multi-omics exploratory analysis) in a step-wise manner. While in several studies, we have previously combined those integrative tools, here, we provide a systematic description of the STATegra framework and its validation using two The Cancer Genome Atlas (TCGA) case studies. For both, the Glioblastoma and the Skin Cutaneous Melanoma (SKCM) cases, we demonstrate an enhanced capacity of the framework (and beyond the individual tools) to identify features and pathways compared to single-omics analysis. Such an integrative multi-omics analysis framework for identifying features and components facilitates the discovery of new biology. Finally, we provide several options for applying the STATegra framework when parametric assumptions are fulfilled and for the case when not all the samples are profiled for all omics. The STATegra framework is built using several tools, which are being integrated step-by-step as OpenSource in the STATegRa Bioconductor package.1


F1000Research ◽  
2018 ◽  
Vol 7 ◽  
pp. 1968 ◽  
Author(s):  
Roderic Guigo ◽  
Michiel de Hoon

At the beginning of this century, the Human Genome Project produced the first drafts of the human genome sequence. Following this, large-scale functional genomics studies were initiated to understand the molecular basis underlying the translation of the instructions encoded in the genome into the biological traits of organisms. Instrumental in the ensuing revolution in functional genomics were the rapid advances in massively parallel sequencing technologies as well as the development of a wide diversity of protocols that make use of these technologies to understand cellular behavior at the molecular level. Here, we review recent advances in functional genomic methods, discuss some of their current capabilities and limitations, and briefly sketch future directions within the field.


2018 ◽  
Vol 62 (4) ◽  
pp. 563-574 ◽  
Author(s):  
Charlotte Ramon ◽  
Mattia G. Gollub ◽  
Jörg Stelling

At genome scale, it is not yet possible to devise detailed kinetic models for metabolism because data on the in vivo biochemistry are too sparse. Predictive large-scale models for metabolism most commonly use the constraint-based framework, in which network structures constrain possible metabolic phenotypes at steady state. However, these models commonly leave many possibilities open, making them less predictive than desired. With increasingly available –omics data, it is appealing to increase the predictive power of constraint-based models (CBMs) through data integration. Many corresponding methods have been developed, but data integration is still a challenge and existing methods perform less well than expected. Here, we review main approaches for the integration of different types of –omics data into CBMs focussing on the methods’ assumptions and limitations. We argue that key assumptions – often derived from single-enzyme kinetics – do not generally apply in the context of networks, thereby explaining current limitations. Emerging methods bridging CBMs and biochemical kinetics may allow for –omics data integration in a common framework to provide more accurate predictions.


2021 ◽  
Author(s):  
Matti Hoch ◽  
Suchi Smita Gupta ◽  
Konstantin Cesnulevicius ◽  
David Lescheid ◽  
Myron Schultz ◽  
...  

Disease maps have emerged as computational knowledge bases for exploring and modeling disease-specific molecular processes. By capturing molecular interactions, disease-associated processes, and phenotypes in standardized representations, disease maps provide a platform for applying bioinformatics and systems biology approaches. Applications range from simple map exploration to algorithm-driven target discovery and network perturbation. The web-based MINERVA environment for disease maps provides a platform to develop tools not only for mapping experimental data but also to identify, analyze and simulate disease-specific regulatory networks. We have developed a MINERVA plugin suite based on network topology and enrichment analyses that facilitate multi-omics data integration and enable in silico perturbation experiments on disease maps. We demonstrate workflows by analyzing two RNA-seq datasets on the Atlas of Inflammation Resolution (AIR). Our approach improves usability and increases the functionality of disease maps by providing easy access to available data and integration of self-generated data. It supports efficient and intuitive analysis of omics data, with a focus on disease maps.


2006 ◽  
Vol 3 (1) ◽  
pp. 45-55
Author(s):  
P. Romano ◽  
G. Bertolini ◽  
F. De Paoli ◽  
M. Fattore ◽  
D. Marra ◽  
...  

Summary The Human Genome Project has deeply transformed biology and the field has since then expanded to the management, processing, analysis and visualization of large quantities of data from genomics, proteomics, medicinal chemistry and drug screening. This huge amount of data and the heterogeneity of software tools that are used implies the adoption on a very large scale of new, flexible tools that can enable researchers to integrate data and analysis on the network. ICT technology standards and tools, like Web Services and related languages, and workflow management systems, can support the creation and deployment of such systems. While a number of Web Services are appearing and personal workflow management systems are also being more and more offered to researchers, a reference portal enabling the vast majority of unskilled researchers to take profit from these new technologies is still lacking. In this paper, we introduce the rationale for the creation of such a portal and present the architecture and some preliminary results for the development of a portal for the enactment of workflows of interest in oncology.


Author(s):  
Debra J. H. Mathews

Public health genetics (more commonly referred to as “community genetics” in Europe) has been practiced to some degree in the West since at least the 1960s, but the development of a cohesive field took time and advances in technology. The application of genetics and genomics to prevent disease and promote public health became firmly established as a field in the late 1990s, as large-scale sequencing of the human genome as part of the Human Genome Project began. The field is now thriving, leading to both tremendous public health benefits and risks for both individuals and populations. This chapter provides an overview of the section of The Oxford Handbook of Public Health Ethics dedicated to public health genetics. The chapters roughly trace the evolution of public health genetics from its roots in eugenics, to the present challenges faced in newborn screening and biobanking, and finally to emerging questions raised by the application of genomics to infectious disease.


2021 ◽  
pp. 13-36
Author(s):  
Christopher L. Cummings ◽  
Kaitlin M. Volk ◽  
Anna A. Ulanova ◽  
Do Thuy Uyen Ha Lam ◽  
Pei Rou Ng

AbstractThe field of biotechnology has been rigorously researched and applied to many facets of everyday life. Biotechnology is defined as the process of modifying an organism or a biological system for an intended purpose. Biotechnology applications range from agricultural crop selection to pharmaceutical and genetic processes (Bauer and Gaskell 2002). The definition, however, is evolving with recent scientific advancements. Until World War II, biotechnology was primarily siloed in agricultural biology and chemical engineering. The results of this era included disease-resistant crops, pesticides, and other pest-controlling tools (Verma et al. 2011). After WWII, biotechnology began to shift domains when advanced research on human genetics and DNA started. In 1984, the Human Genome Project (HGP) was formerly proposed, which initiated the pursuit to decode the human genome by the private and academic sectors. The legacy of the project gave rise to ancillary advancements in data sharing and open-source software, and solidified the prominence of “big science;” solidifying capital-intensive large-scale private-public research initiatives that were once primarily under the purview of government-funded programs (Hood and Rowen 2013). After the HGP, the biotechnology industry boomed as a result of dramatic cost reduction to DNA sequencing processes. In 2019 the industry was globally estimated to be worth $449.06 billion and is projected to increase in value (Polaris 2020).


Author(s):  
Wolfgang Wurst ◽  
Achim Gossler

Gene trap (GT) strategies in mouse embryonic stem (ES) cells are increasingly being used for detecting patterns of gene expression (1-4, isolating and mutating endogenous genes (5-7), and identifying targets of signalling molecules and transcription factors (3, 8-10). The general term gene trap refers to the random integration of a reporter gene construct (called entrapment vector) (11, 12) into the genome such that ‘productive’ integration events bring the reporter gene under the transcriptional regulation of an endogenous gene. In some cases this also simultaneously generates an insertional mutation. Entrapment vectors were originally developed in bacteria (13), and applied in Drosophila to identify novel developmental genes and/or regulatory sequences (14-17). Subsequently, a modified strategy was developed for mouse in which the reporter gene mRNA becomes fused to an endogenous transcript. Such ‘gene trap’ vectors were initially used primarily as a tool to discover genes involved in development (1, 2,18). In the last five years there has been a significant shift of GT approaches in mouse to much broader, large scale applications in the context of the analysis of mammalian genomes and ‘functional genomics’. Sequencing and physical mapping of both the human and mouse genomes is expected to be completed within the next five years. Already, a large number of mouse and human genes have been identified as expressed sequence tags (ESTs), and very likely the majority of genes will be discovered as ESTs shortly. This vast sequence information contrasts with a rather limited understanding of the in vivo functions of these genes. Whereas DNA sequence can provide some indication of the potential functions of these genes and their products, their physiological roles in the organism have to be determined by mutational analysis. Thus, the sequencing effort of the human genome project has to be complemented by efficient functional analyses of the identified genes. One potentially powerful complementation to the efforts of the human genome project would be a strategy whereby large scale random mutagenesis in mouse is combined with the rapid identification of the mutated genes (6,7,19, and German gene trap consortium, W. W. unpublished data).


2016 ◽  
Vol 23 (1) ◽  
pp. 21
Author(s):  
Kremema Star ◽  
Barbara Birshtein

The human genome project created the field of genomics – understanding genetic material on a large scale. Scientists are deciphering the information held within the sequence of our genome. By building upon this knowledge, physicians and scientists will create fundamental new technologies to understand the contribution of genetics to diagnosis, prognosis, monitoring, and treatment of human disease. The science of genomic medicine has only begun to affect our understanding of health.


Sign in / Sign up

Export Citation Format

Share Document