scholarly journals Muon: multimodal omics analysis framework

2021 ◽  
Author(s):  
Danila Bredikhin ◽  
Ilia Kats ◽  
Oliver Stegle

Advances in multi-omics technologies have led to an explosion of multimodal datasets to address questions ranging from basic biology to translation. While these rich data provide major opportunities for discovery, they also come with data management and analysis challenges, thus motivating the development of tailored computational solutions to deal with multi-omics data. Here, we present a data standard and an analysis framework for multi-omics - MUON - designed to organise, analyse, visualise, and exchange multimodal data. MUON stores multimodal data in an efficient yet flexible data structure, supporting an arbitrary number of omics layers. The MUON data structure is interoperable with existing community standards for single omics, and it provides easy access to both data from individual omics as well as multimodal dataviews. Building on this data infrastructure, MUON enables a versatile range of analyses, from data preprocessing, the construction of multi-omics containers to flexible multi-omics alignment.

2021 ◽  
Author(s):  
Félix Raimundo ◽  
Laetitia Papaxanthos ◽  
Céline Vallot ◽  
Jean-Philippe Vert

AbstractSingle-cell omics technologies produce large quantities of data describing the genomic, transcriptomic or epigenomic profiles of many individual cells in parallel. In order to infer biological knowledge and develop predictive models from these data, machine learning (ML)-based model are increasingly used due to their flexibility, scalability, and impressive success in other fields. In recent years, we have seen a surge of new ML-based method development for low-dimensional representations of single-cell omics data, batch normalization, cell type classification, trajectory inference, gene regulatory network inference or multimodal data integration. To help readers navigate this fast-moving literature, we survey in this review recent advances in ML approaches developed to analyze single-cell omics data, focusing mainly on peer-reviewed publications published in the last two years (2019-2020).


Author(s):  
K. Al Kalbani ◽  
A. Abdul Rahman

Abstract. The paper investigates the capability to integrate the surface and subsurface 3D spatial objects data structure within the 3D spatial data infrastructure (3D SDI) based on the CityGML standards. In fact, a number of countries around the world have started applying the 3D city models for their planning and infrastructure management. While others are still working toward 3D SDI by using CityGML standards. Moreover, most of these initiatives focus on the surface spatial objects with less interest to model subsurface spatial objects. However, dealing with 3D SDI requires both surface and subsurface spatial objects with clear consideration on the issues and challenges in terms of the data structure. On the other hand, the study has used geospatial tools and databases such as FME, PostgreSQL-PostGIS, and 3D City Database to generate the 3D model and to test the capability for integrating the surface and subsurface 3D spatial objects data structure within the 3D SDI. This paper concludes by describing a framework that aims to integrate surface and subsurface 3D geospatial objects data structure in Oman SDI. The authors believe that there are possible solutions based on CityGML standards for surface and subsurface 3D spatial objects. Moreover, solving the issues in data structure can establish a better vision and open new avenues for the 3D SDI.


2020 ◽  
Vol 49 (D1) ◽  
pp. D18-D28
Author(s):  
◽  
Yongbiao Xue ◽  
Yiming Bao ◽  
Zhang Zhang ◽  
Wenming Zhao ◽  
...  

Abstract The National Genomics Data Center (NGDC), part of the China National Center for Bioinformation (CNCB), provides a suite of database resources to support worldwide research activities in both academia and industry. With the explosive growth of multi-omics data, CNCB-NGDC is continually expanding, updating and enriching its core database resources through big data deposition, integration and translation. In the past year, considerable efforts have been devoted to 2019nCoVR, a newly established resource providing a global landscape of SARS-CoV-2 genomic sequences, variants, and haplotypes, as well as Aging Atlas, BrainBase, GTDB (Glycosyltransferases Database), LncExpDB, and TransCirc (Translation potential for circular RNAs). Meanwhile, a series of resources have been updated and improved, including BioProject, BioSample, GWH (Genome Warehouse), GVM (Genome Variation Map), GEN (Gene Expression Nebulas) as well as several biodiversity and plant resources. Particularly, BIG Search, a scalable, one-stop, cross-database search engine, has been significantly updated by providing easy access to a large number of internal and external biological resources from CNCB-NGDC, our partners, EBI and NCBI. All of these resources along with their services are publicly accessible at https://bigd.big.ac.cn.


Author(s):  
Shixiang Wang ◽  
Yi Xiong ◽  
Kai Gu ◽  
Longfei Zhao ◽  
Yin Li ◽  
...  

Motivation: UCSC Xena platform provides huge amounts of processed cancer omics data from big public projects like TCGA or individual reserach groups for enabling unprecedented research opportunities. In 2019, we developed UCSCXenaTools, an R package for retrieval of UCSC Xena data. However, an easier dataset exploration and analysis tool is still lack, especially for researchers without programming experience. Results: We develop UCSCXenaShiny, an R Shiny package to quickly explore, download all datasets from UCSC Xena data hubs. In addiction, a module based analysis framework is constructed to analyze and visualize data. Availability: https://github.com/openbiox/UCSCXenaShiny or https://cran.r-project.org/package=UCSCXenaShiny.


2019 ◽  
Author(s):  
Serghei Mangul

Recent advances in omics technologies have led to the broad applicability of computational techniques across various domains of life science and medical research. These technologies provide an unprecedented opportunity to collect omics data from hundreds of thousands of individuals and to study gene-disease association without the aid of prior assumptions about the trait biology. Despite the many advantages of modern omics technologies, interpretations of big data produced by such technologies require advanced computational algorithms. Below I outline key challenges that biomedical researches are facing when interpreting and integrating big omics data. I discuss the reproducibility aspect of big data analysis in the life sciences and review current practices in reproducible research. Finally, I explain the skills which biomedical researchers need to acquire in order to independently analyze big omics data.


2020 ◽  
Vol 49 (D1) ◽  
pp. D1502-D1506
Author(s):  
Ugis Sarkans ◽  
Anja Füllgrabe ◽  
Ahmed Ali ◽  
Awais Athar ◽  
Ehsan Behrangi ◽  
...  

Abstract ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.


2021 ◽  
Author(s):  
Klaus Hinsby ◽  
Laurence Gourcy ◽  
Hans Peter Broers ◽  
Anker Lajer Højberg ◽  
Marco Bianchi ◽  
...  

<p>Sustainable evolution of groundwater quantity and quality is essential for sustainable development and protection of society and nature, globally, as acknowledged in the UN sustainable development goals and the European Green Deal. Too much? – too little? – and/or too polluted? are important questions to pose and answer in a changing climate with increasing pressures on water resources, severe loss of biodiversity, and a projected increase in extreme events resulting in an increasing risk of floods, droughts, landslides and land subsidence.   </p><p>Easy access to digital and FAIR (Findable, Accessible, Interoperable and reusable) data on groundwater quantity and quality is imperative for informed decision making and efficient climate change mitigation and adaptation to which sustainable groundwater management will contribute. Here we briefly present selected highlights and digital data products from the four GeoERA groundwater projects developed for and made available on the digital subsurface information platform of the European geological survey organizations. The ambition is to develop the digital information platform, EGDI (the European Geological Data Infrastructure) as the leading information platform for sustainable and integrated management of subsurface resources in Europe and one of the leading platforms, globally.</p>


2021 ◽  
Vol 350 ◽  
pp. S124
Author(s):  
M.C. Verheijen ◽  
T.W. Gant ◽  
W. Tong ◽  
F. Caiment

Sign in / Sign up

Export Citation Format

Share Document