scholarly journals A U-statistics for integrative analysis of multilayer omics data

2020 ◽  
Vol 36 (8) ◽  
pp. 2365-2374
Author(s):  
Xiaqiong Wang ◽  
Yalu Wen

Abstract Motivation The emerging multilayer omics data provide unprecedented opportunities for detecting biomarkers that are associated with complex diseases at various molecular levels. However, the high-dimensionality of multiomics data and the complex disease etiologies have brought tremendous analytical challenges. Results We developed a U-statistics-based non-parametric framework for the association analysis of multilayer omics data, where consensus and permutation-based weighting schemes are developed to account for various types of disease models. Our proposed method is flexible for analyzing different types of outcomes as it makes no assumptions about their distributions. Moreover, it explicitly accounts for various types of underlying disease models through weighting schemes and thus provides robust performance against them. Through extensive simulations and the application to dataset obtained from the Alzheimer’s Disease Neuroimaging Initiatives, we demonstrated that our method outperformed the commonly used kernel regression-based methods. Availability and implementation The R-package is available at https://github.com/YaluWen/Uomic. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Yang Hai ◽  
Yalu Wen

Abstract Motivation Accurate disease risk prediction is essential for precision medicine. Existing models either assume that diseases are caused by groups of predictors with small-to-moderate effects or a few isolated predictors with large effects. Their performance can be sensitive to the underlying disease mechanisms, which are usually unknown in advance. Results We developed a Bayesian linear mixed model (BLMM), where genetic effects were modelled using a hybrid of the sparsity regression and linear mixed model with multiple random effects. The parameters in BLMM were inferred through a computationally efficient variational Bayes algorithm. The proposed method can resemble the shape of the true effect size distributions, captures the predictive effects from both common and rare variants, and is robust against various disease models. Through extensive simulations and the application to a whole-genome sequencing dataset obtained from the Alzheimer’s Disease Neuroimaging Initiatives, we have demonstrated that BLMM has better prediction performance than existing methods and can detect variables and/or genetic regions that are predictive. Availability The R-package is available at https://github.com/yhai943/BLMM Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (3) ◽  
pp. 842-850 ◽  
Author(s):  
Cheng Peng ◽  
Jun Wang ◽  
Isaac Asante ◽  
Stan Louie ◽  
Ran Jin ◽  
...  

Abstract Motivation Epidemiologic, clinical and translational studies are increasingly generating multiplatform omics data. Methods that can integrate across multiple high-dimensional data types while accounting for differential patterns are critical for uncovering novel associations and underlying relevant subgroups. Results We propose an integrative model to estimate latent unknown clusters (LUCID) aiming to both distinguish unique genomic, exposure and informative biomarkers/omic effects while jointly estimating subgroups relevant to the outcome of interest. Simulation studies indicate that we can obtain consistent estimates reflective of the true simulated values, accurately estimate subgroups and recapitulate subgroup-specific effects. We also demonstrate the use of the integrated model for future prediction of risk subgroups and phenotypes. We apply this approach to two real data applications to highlight the integration of genomic, exposure and metabolomic data. Availability and Implementation The LUCID method is implemented through the LUCIDus R package available on CRAN (https://CRAN.R-project.org/package=LUCIDus). Supplementary information Supplementary materials are available at Bioinformatics online.


2019 ◽  
Vol 36 (6) ◽  
pp. 1785-1794
Author(s):  
Jun Li ◽  
Qing Lu ◽  
Yalu Wen

Abstract Motivation The use of human genome discoveries and other established factors to build an accurate risk prediction model is an essential step toward precision medicine. While multi-layer high-dimensional omics data provide unprecedented data resources for prediction studies, their corresponding analytical methods are much less developed. Results We present a multi-kernel penalized linear mixed model with adaptive lasso (MKpLMM), a predictive modeling framework that extends the standard linear mixed models widely used in genomic risk prediction, for multi-omics data analysis. MKpLMM can capture not only the predictive effects from each layer of omics data but also their interactions via using multiple kernel functions. It adopts a data-driven approach to select predictive regions as well as predictive layers of omics data, and achieves robust selection performance. Through extensive simulation studies, the analyses of PET-imaging outcomes from the Alzheimer’s Disease Neuroimaging Initiative study, and the analyses of 64 drug responses, we demonstrate that MKpLMM consistently outperforms competing methods in phenotype prediction. Availability and implementation The R-package is available at https://github.com/YaluWen/OmicPred. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (24) ◽  
pp. 5182-5190 ◽  
Author(s):  
Luis G Leal ◽  
Alessia David ◽  
Marjo-Riita Jarvelin ◽  
Sylvain Sebert ◽  
Minna Männikkö ◽  
...  

Abstract Motivation Integration of different omics data could markedly help to identify biological signatures, understand the missing heritability of complex diseases and ultimately achieve personalized medicine. Standard regression models used in Genome-Wide Association Studies (GWAS) identify loci with a strong effect size, whereas GWAS meta-analyses are often needed to capture weak loci contributing to the missing heritability. Development of novel machine learning algorithms for merging genotype data with other omics data is highly needed as it could enhance the prioritization of weak loci. Results We developed cNMTF (corrected non-negative matrix tri-factorization), an integrative algorithm based on clustering techniques of biological data. This method assesses the inter-relatedness between genotypes, phenotypes, the damaging effect of the variants and gene networks in order to identify loci-trait associations. cNMTF was used to prioritize genes associated with lipid traits in two population cohorts. We replicated 129 genes reported in GWAS world-wide and provided evidence that supports 85% of our findings (226 out of 265 genes), including recent associations in literature (NLGN1), regulators of lipid metabolism (DAB1) and pleiotropic genes for lipid traits (CARM1). Moreover, cNMTF performed efficiently against strong population structures by accounting for the individuals’ ancestry. As the method is flexible in the incorporation of diverse omics data sources, it can be easily adapted to the user’s research needs. Availability and implementation An R package (cnmtf) is available at https://lgl15.github.io/cnmtf_web/index.html. Supplementary information Supplementary data are available at Bioinformatics online.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Abel Torres-Espín ◽  
Austin Chou ◽  
J Russell Huie ◽  
Nikos Kyritsis ◽  
Pavan S Upadhyayula ◽  
...  

Biomedical data are usually analyzed at the univariate level, focused on a single primary outcome measure to provide insight into systems biology, complex disease states, and precision medicine opportunities. More broadly, these complex biological and disease states can be detected as common factors emerging from the relationships among measured variables using multivariate approaches. ‘Syndromics’ refers to an analytical framework for measuring disease states using principal component analysis and related multivariate statistics as primary tools for extracting underlying disease patterns. A key part of the syndromic workflow is the interpretation, the visualization, and the study of robustness of the main components that characterize the disease space. We present a new software package, syndRomics, an open-source R package with utility for component visualization, interpretation, and stability for syndromic analysis. We document the implementation of syndRomics and illustrate the use of the package in case studies of neurological trauma data.


2019 ◽  
Author(s):  
Sebastian J Teran Hidalgo ◽  
Mengyun Wu ◽  
Shuangge Ma

Abstract Summary Multilayer omics profiling has become a major venue for understanding complex diseases. We develop NCutYX, an R package for clustering analysis of multilayer omics data. The package and methods jointly analyze multiple layers of omics measurements and effectively accommodate their regulations. They systematically conduct a series of analysis based on the normalized cut technique, including the clusterings of subjects and omics measurements and biclustering. The package can be valuable for its timely context, novel methods, and comprehensiveness. Availability https://cran.r-project.org/web/packages/NCutYX/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Noemi Di Nanni ◽  
Matteo Gnocchi ◽  
Marco Moscatelli ◽  
Luciano Milanesi ◽  
Ettore Mosca

Abstract Motivation Multi-omics approaches offer the opportunity to reconstruct a more complete picture of the molecular events associated with human diseases, but pose challenges in data analysis. Network-based methods for the analysis of multi-omics leverage the complex web of macromolecular interactions occurring within cells to extract significant patterns of molecular alterations. Existing network-based approaches typically address specific combinations of omics and are limited in terms of the number of layers that can be jointly analysed. In this study, we investigate the application of network diffusion to quantify gene relevance on the basis of multiple evidences (layers). Results We introduce a gene score (mND) that quantifies the relevance of a gene in a biological process taking into account the network proximity of the gene and its first neighbours to other altered genes. We show that mND has a better performance over existing methods in finding altered genes in network proximity in one or more layers. We also report good performances in recovering known cancer genes. The pipeline described in this article is broadly applicable, because it can handle different types of inputs: in addition to multi-omics datasets, datasets that are stratified in many classes (e.g., cell clusters emerging from single cell analyses) or a combination of the two scenarios. Availability and implementation The R package ‘mND’ is available at URL: https://www.itb.cnr.it/mnd. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Author(s):  
Nanne Aben ◽  
Johan A. Westerhuis ◽  
Yipeng Song ◽  
Henk A.L. Kiers ◽  
Magali Michaut ◽  
...  

AbstractMotivationIn biology, we are often faced with multiple datasets recorded on the same set of objects, such as multi-omics and phenotypic data of the same tumors. These datasets are typically not independent from each other. For example, methylation may influence gene expression, which may, in turn, influence drug response. Such relationships can strongly affect analyses performed on the data, as we have previously shown for the identification of biomarkers of drug response. Therefore, it is important to be able to chart the relationships between datasets.ResultsWe present iTOP, a methodology to infera topology of relationships between datasets. We base this methodology on the RV coefficient, a measure of matrix correlation, which can be used to determine how much information is shared between two datasets. We extended the RV coefficient for partial matrix correlations, which allows the use of graph reconstruction algorithms, such as the PC algorithm, to infer the topologies. In addition, since multi-omics data often contain binary data (e.g. mutations), we also extended the RV coefficient for binary data. Applying iTOP to pharmacogenomics data, we found that gene expression acts as a mediator between most other datasets and drug response: only proteomics clearly shares information with drug response that is not present in gene expression. Based on this result, we used TANDEM, a method for drug response prediction, to identify which variables predictive of drug response were distinct to either gene expression or proteomics.AvailabilityAn implementation of our methodology is available in the R package iTOP on CRAN. Additionally, an R Markdown document with code to reproduce all figures is provided as Supplementary [email protected] and [email protected] informationSupplementary data are available at Bioinformatics online.


Author(s):  
Emma H Gail ◽  
Anup D Shah ◽  
Ralf B Schittenhelm ◽  
Chen Davidovich

Abstract Summary Unbiased detection of protein–protein and protein–RNA interactions within ribonucleoprotein complexes are enabled through crosslinking followed by mass spectrometry. Yet, different methods detect different types of molecular interactions and therefore require the usage of different software packages with limited compatibility. We present crisscrosslinkeR, an R package that maps both protein–protein and protein–RNA interactions detected by different types of approaches for crosslinking with mass spectrometry. crisscrosslinkeR produces output files that are compatible with visualization using popular software packages for the generation of publication-quality figures. Availability and implementation crisscrosslinkeR is a free and open-source package, available through GitHub: github.com/egmg726/crisscrosslinker. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Antoine Bodein ◽  
Marie-Pier Scott-Boyer ◽  
Olivier Perin ◽  
Kim-Anh Lê Cao ◽  
Arnaud Droit

Abstract Motivation Multi-omics data integration enables the global analysis of biological systems and discovery of new biological insights. Multi-omics experimental designs have been further extended with a longitudinal dimension to study dynamic relationships between molecules. However, methods that integrate longitudinal multi-omics data are still in their infancy. Results We introduce the R package timeOmics, a generic analytical framework for the integration of longitudinal multi-omics data. The framework includes pre-processing, modeling and clustering to identify molecular features strongly associated with time. We illustrate this framework in a case study to detect seasonal patterns of mRNA, metabolites, gut taxa and clinical variables in patients with diabetes mellitus from the integrative Human Microbiome Project. Availabilityand implementation timeOmics is available on Bioconductor and github.com/abodein/timeOmics. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document