Muon: multimodal omics analysis framework

Mapping Intimacies ◽

10.1101/2021.06.01.445670 ◽

2021 ◽

Author(s):

Danila Bredikhin ◽

Ilia Kats ◽

Oliver Stegle

Keyword(s):

Data Structure ◽

Easy Access ◽

Omics Data ◽

Analysis Framework ◽

Multimodal Data ◽

Data Infrastructure ◽

Omics Technologies ◽

Data Standard ◽

Basic Biology ◽

Rich Data

Advances in multi-omics technologies have led to an explosion of multimodal datasets to address questions ranging from basic biology to translation. While these rich data provide major opportunities for discovery, they also come with data management and analysis challenges, thus motivating the development of tailored computational solutions to deal with multi-omics data. Here, we present a data standard and an analysis framework for multi-omics - MUON - designed to organise, analyse, visualise, and exchange multimodal data. MUON stores multimodal data in an efficient yet flexible data structure, supporting an arbitrary number of omics layers. The MUON data structure is interoperable with existing community standards for single omics, and it provides easy access to both data from individual omics as well as multimodal dataviews. Building on this data infrastructure, MUON enables a versatile range of analyses, from data preprocessing, the construction of multi-omics containers to flexible multi-omics alignment.

Get full-text (via PubEx)

Machine learning for single cell genomics data analysis

10.1101/2021.02.04.429763 ◽

2021 ◽

Author(s):

Félix Raimundo ◽

Laetitia Papaxanthos ◽

Céline Vallot ◽

Jean-Philippe Vert

Keyword(s):

Machine Learning ◽

Single Cell ◽

Network Inference ◽

Method Development ◽

Biological Knowledge ◽

Omics Data ◽

Gene Regulatory Network Inference ◽

Multimodal Data ◽

Low Dimensional ◽

Type Classification

AbstractSingle-cell omics technologies produce large quantities of data describing the genomic, transcriptomic or epigenomic profiles of many individual cells in parallel. In order to infer biological knowledge and develop predictive models from these data, machine learning (ML)-based model are increasingly used due to their flexibility, scalability, and impressive success in other fields. In recent years, we have seen a surge of new ML-based method development for low-dimensional representations of single-cell omics data, batch normalization, cell type classification, trajectory inference, gene regulatory network inference or multimodal data integration. To help readers navigate this fast-moving literature, we survey in this review recent advances in ML approaches developed to analyze single-cell omics data, focusing mainly on peer-reviewed publications published in the last two years (2019-2020).

Get full-text (via PubEx)

INTEGRATION BETWEEN SURFACE AND SUBSURFACE SPATIAL OBJECTS FOR DEVELOPING OMAN 3D SDI BASED ON THE CITYGML STANDARD

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-4-w16-79-2019 ◽

2019 ◽

Vol XLII-4/W16 ◽

pp. 79-84

Author(s):

K. Al Kalbani ◽

A. Abdul Rahman

Keyword(s):

Data Structure ◽

Spatial Data ◽

3D Model ◽

Infrastructure Management ◽

Data Infrastructure ◽

3D City Models ◽

Spatial Objects ◽

The World ◽

Geospatial Tools ◽

City Models

Abstract. The paper investigates the capability to integrate the surface and subsurface 3D spatial objects data structure within the 3D spatial data infrastructure (3D SDI) based on the CityGML standards. In fact, a number of countries around the world have started applying the 3D city models for their planning and infrastructure management. While others are still working toward 3D SDI by using CityGML standards. Moreover, most of these initiatives focus on the surface spatial objects with less interest to model subsurface spatial objects. However, dealing with 3D SDI requires both surface and subsurface spatial objects with clear consideration on the issues and challenges in terms of the data structure. On the other hand, the study has used geospatial tools and databases such as FME, PostgreSQL-PostGIS, and 3D City Database to generate the 3D model and to test the capability for integrating the surface and subsurface 3D spatial objects data structure within the 3D SDI. This paper concludes by describing a framework that aims to integrate surface and subsurface 3D geospatial objects data structure in Oman SDI. The authors believe that there are possible solutions based on CityGML standards for surface and subsurface 3D spatial objects. Moreover, solving the issues in data structure can establish a better vision and open new avenues for the 3D SDI.

Get full-text (via PubEx)

Deep Pathway Analysis V2.0: A Pathway Analysis Framework Incorporating Multi-dimensional Omics Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2019.2945959 ◽

2019 ◽

pp. 1-1 ◽

Cited By ~ 1

Author(s):

Yue Zhao ◽

Dong-Guk Shin

Keyword(s):

Pathway Analysis ◽

Omics Data ◽

Analysis Framework

Get full-text (via PubEx)

Database Resources of the National Genomics Data Center, China National Center for Bioinformation in 2021

Nucleic Acids Research ◽

10.1093/nar/gkaa1022 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D18-D28

Author(s):

◽

Yongbiao Xue ◽

Yiming Bao ◽

Zhang Zhang ◽

Wenming Zhao ◽

...

Keyword(s):

Gene Expression ◽

Data Center ◽

Database Search ◽

Easy Access ◽

Circular Rnas ◽

Omics Data ◽

Plant Resources ◽

The Past ◽

Research Activities ◽

One Stop

Abstract The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a suite of database resources to support worldwide research activities in both academia and industry. With the explosive growth of multi-omics data, CNCB-NGDC is continually expanding, updating and enriching its core database resources through big data deposition, integration and translation. In the past year, considerable efforts have been devoted to 2019nCoVR, a newly established resource providing a global landscape of SARS-CoV-2 genomic sequences, variants, and haplotypes, as well as Aging Atlas, BrainBase, GTDB (Glycosyltransferases Database), LncExpDB, and TransCirc (Translation potential for circular RNAs). Meanwhile, a series of resources have been updated and improved, including BioProject, BioSample, GWH (Genome Warehouse), GVM (Genome Variation Map), GEN (Gene Expression Nebulas) as well as several biodiversity and plant resources. Particularly, BIG Search, a scalable, one-stop, cross-database search engine, has been significantly updated by providing easy access to a large number of internal and external biological resources from CNCB-NGDC, our partners, EBI and NCBI. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.

Get full-text (via PubEx)

UCSCXenaShiny: An R Package for Exploring and Analyzing UCSC Xena Public Datasets in Web Browser

10.20944/preprints202007.0179.v1 ◽

2020 ◽

Author(s):

Shixiang Wang ◽

Yi Xiong ◽

Kai Gu ◽

Longfei Zhao ◽

Yin Li ◽

...

Keyword(s):

R Package ◽

Data Availability ◽

Analysis Tool ◽

Omics Data ◽

Analysis Framework ◽

Web Browser ◽

Research Opportunities ◽

Public Projects ◽

R Shiny ◽

Public Datasets

Motivation: UCSC Xena platform provides huge amounts of processed cancer omics data from big public projects like TCGA or individual reserach groups for enabling unprecedented research opportunities. In 2019, we developed UCSCXenaTools, an R package for retrieval of UCSC Xena data. However, an easier dataset exploration and analysis tool is still lack, especially for researchers without programming experience. Results: We develop UCSCXenaShiny, an R Shiny package to quickly explore, download all datasets from UCSC Xena data hubs. In addiction, a module based analysis framework is constructed to analyze and visualize data. Availability: https://github.com/openbiox/UCSCXenaShiny or https://cran.r-project.org/package=UCSCXenaShiny.

Get full-text (via PubEx)

Interpreting and integrating big data in the life sciences

10.7287/peerj.preprints.27603v2 ◽

2019 ◽

Author(s):

Serghei Mangul

Keyword(s):

Big Data ◽

Life Science ◽

Life Sciences ◽

Reproducible Research ◽

Computational Techniques ◽

Omics Data ◽

Omics Technologies ◽

Domains Of Life ◽

The Many ◽

Review Current

Recent advances in omics technologies have led to the broad applicability of computational techniques across various domains of life science and medical research. These technologies provide an unprecedented opportunity to collect omics data from hundreds of thousands of individuals and to study gene-disease association without the aid of prior assumptions about the trait biology. Despite the many advantages of modern omics technologies, interpretations of big data produced by such technologies require advanced computational algorithms. Below I outline key challenges that biomedical researches are facing when interpreting and integrating big omics data. I discuss the reproducibility aspect of big data analysis in the life sciences and review current practices in reproducible research. Finally, I explain the skills which biomedical researchers need to acquire in order to independently analyze big omics data.

Get full-text (via PubEx)

From ArrayExpress to BioStudies

Nucleic Acids Research ◽

10.1093/nar/gkaa1062 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D1502-D1506

Author(s):

Ugis Sarkans ◽

Anja Füllgrabe ◽

Ahmed Ali ◽

Awais Athar ◽

Ehsan Behrangi ◽

...

Keyword(s):

Functional Genomics ◽

Microarray Data ◽

Archival Data ◽

Central Concept ◽

Multimodal Data ◽

Online Tool ◽

Technical Aspects ◽

Data Infrastructure ◽

European Nucleotide Archive ◽

Programmatic Access

Abstract ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.

Get full-text (via PubEx)

Faculty Opinions recommendation of Handling missing rows in multi-omics data integration: multiple imputation in multiple factor analysis framework.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.726822656.793529462 ◽

2017 ◽

Author(s):

Charles Auffray ◽

Bertrand De Meulder ◽

Manlio Vinciguerra

Keyword(s):

Factor Analysis ◽

Data Integration ◽

Multiple Imputation ◽

Omics Data ◽

Analysis Framework ◽

Multiple Factor Analysis ◽

Multiple Factor ◽

Omics Data Integration

Get full-text (via PubEx)

Introducing digital information products of the four GeoERA groundwater projects for assessment and sustainable use of water resources and the subsurface in a changing climate

10.5194/egusphere-egu21-11944 ◽

2021 ◽

Author(s):

Klaus Hinsby ◽

Laurence Gourcy ◽

Hans Peter Broers ◽

Anker Lajer Højberg ◽

Marco Bianchi ◽

...

Keyword(s):

Sustainable Development ◽

Water Resources ◽

Digital Data ◽

Easy Access ◽

Digital Information ◽

Data Infrastructure ◽

Information Platform ◽

Changing Climate ◽

Mitigation And Adaptation ◽

Groundwater Quantity

<p>Sustainable evolution of groundwater quantity and quality is essential for sustainable development and protection of society and nature, globally, as acknowledged in the UN sustainable development goals and the European Green Deal. Too much? &#8211; too little? &#8211; and/or too polluted? are important questions to pose and answer in a changing climate with increasing pressures on water resources, severe loss of biodiversity, and a projected increase in extreme events resulting in an increasing risk of floods, droughts, landslides and land subsidence. &#160;&#160;</p><p>Easy access to digital and FAIR (Findable, Accessible, Interoperable and reusable) data on groundwater quantity and quality is imperative for informed decision making and efficient climate change mitigation and adaptation to which sustainable groundwater management will contribute. Here we briefly present selected highlights and digital data products from the four GeoERA groundwater projects developed for and made available on the digital subsurface information platform of the European geological survey organizations. The ambition is to develop the digital information platform, EGDI (the European Geological Data Infrastructure) as the leading information platform for sustainable and integrated management of subsurface resources in Europe and one of the leading platforms, globally.</p>

Get full-text (via PubEx)

R-ODAF: Omics data analysis framework for regulatory application

Toxicology Letters ◽

10.1016/s0378-4274(21)00539-7 ◽

2021 ◽

Vol 350 ◽

pp. S124

Author(s):

M.C. Verheijen ◽

T.W. Gant ◽

W. Tong ◽

F. Caiment

Keyword(s):

Data Analysis ◽

Omics Data ◽

Analysis Framework ◽

Omics Data Analysis

Get full-text (via PubEx)