scholarly journals Tailored graphical lasso for data integration in gene network reconstruction

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilla Lingjærde ◽  
Tonje G. Lien ◽  
Ørnulf Borgan ◽  
Helga Bergholtz ◽  
Ingrid K. Glad

Abstract Background Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using $$L_1$$ L 1 -penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. Results We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, . Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein–protein interaction networks. Conclusions The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.

2020 ◽  
Author(s):  
Camilla Lingjærde ◽  
Tonje G Lien ◽  
Ørnulf Borgan ◽  
Ingrid K Glad

AbstractBackgroundIdentifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in Gaussian graphical models in such situations, using L1-penalization on the matrix entries. An extension of the graphical lasso is the weighted graphical lasso, in which prior biological information from other (data) sources is integrated into the model through the weights. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, weighted graphical lasso often fails to utilize the information effectively.ResultsWe propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, tailoredGlasso. Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. With our method, mRNA data are demonstrated to provide highly useful prior information for protein-protein interaction networks.ConclusionsThe method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.


2013 ◽  
Vol 7 ◽  
pp. BBI.S12932 ◽  
Author(s):  
Sam Ansari ◽  
Jean Binder ◽  
Stephanie Boue ◽  
Anselmo Di Fabio ◽  
William Hayes ◽  
...  

Biological networks with a structured syntax are a powerful way of representing biological information generated from high density data; however, they can become unwieldy to manage as their size and complexity increase. This article presents a crowd-verification approach for the visualization and expansion of biological networks. Web-based graphical interfaces allow visualization of causal and correlative biological relationships represented using Biological Expression Language (BEL). Crowdsourcing principles enable participants to communally annotate these relationships based on literature evidences. Gamification principles are incorporated to further engage domain experts throughout biology to gather robust peer-reviewed information from which relationships can be identified and verified. The resulting network models will represent the current status of biological knowledge within the defined boundaries, here processes related to human lung disease. These models are amenable to computational analysis. For some period following conclusion of the challenge, the published models will remain available for continuous use and expansion by the scientific community.


Author(s):  
Jonas M. B. Haslbeck

AbstractStatistical network models such as the Gaussian Graphical Model and the Ising model have become popular tools to analyze multivariate psychological datasets. In many applications, the goal is to compare such network models across groups. In this paper, I introduce a method to estimate group differences in network models that is based on moderation analysis. This method is attractive because it allows one to make comparisons across more than two groups for all parameters within a single model and because it is implemented for all commonly used cross-sectional network models. Next to introducing the method, I evaluate the performance of the proposed method and existing approaches in a simulation study. Finally, I provide a fully reproducible tutorial on how to use the proposed method to compare a network model across three groups using the R-package mgm.


2020 ◽  
Author(s):  
Jonas M B Haslbeck

Statistical network models such as the Gaussian Graphical Model and the Ising model have become popular tools to analyze multivariate psychological data sets. In many applications the goal is to compare such network models across groups. In this paper I introduce a method to estimate differences in network models across groups that is based on moderation analysis. This method is attractive because it allows to make comparisons across more than two groups within a single model, and because it is implemented for all commonly used cross-sectional network models. Next to introducing the method, I evaluate the performance of the proposed method and existing approaches in a simulation study. Finally, I provide a fully reproducible tutorial on how to use the moderation method to compare a network model across three groups using the R-package mgm.


2021 ◽  
Author(s):  
Daniel Redhead ◽  
Richard McElreath ◽  
Cody T. Ross

Social network analysis provides an important framework for studying the causes, consequences, and structure of social ties. Standard self-report measures—e.g., as collected through the popular ‘name-generator’ method—however, do not provide an impartial representation of transfers, interactions, or social relationships. At best, they represent perceptions filtered through the cognitive biases of respondents. Individuals may, for example, report transfers that did not really occur, or forget to mention transfers that really did. The propensity to make such reporting inaccuracies is both an individual-level and item-level characteristic—variable across members of any given group. Past research has high- lighted that many network-level properties are highly sensitive to such reporting inaccuracies. However, there remains a dearth of easily deployed statistical tools that account for such biases. To address this issue, we introduce a latent network model that allows us to jointly estimate parameters measuring both reporting biases and a latent, underlying social network. Building upon past research, we conduct several simulation experiments in which network data are subject to various reporting biases, and find that these reporting biases strongly impact our ability to accurately infer fundamental network properties. These impacts are not adequately addressed using standard approaches to network reconstruction (i.e., treating either the union or the intersection of double-sampled data as the true network), but are appropriately resolved through the use of our latent network models. To make implementation of our models easier for end-users, we provide a fully-documented R package, STRAND, and include a tutorial illustrating its functionality when applied to empirical food/money sharing data from a rural Colombian population.


2018 ◽  
Vol 19 (5) ◽  
pp. 545-568 ◽  
Author(s):  
Geneviéve Robin ◽  
Christophe Ambroise ◽  
Stéphane Robin

Graphical network inference is used in many fields such as genomics or ecology to infer the conditional independence structure between variables, from measurements of gene expression or species abundances for instance. In many practical cases, not all variables involved in the network have been observed, and the samples are actually drawn from a distribution where some variables have been marginalized out. This challenges the sparsity assumption commonly made in graphical model inference, since marginalization yields locally dense structures, even when the original network is sparse. We present a procedure for inferring Gaussian graphical models when some variables are unobserved, that accounts both for the influence of missing variables and the low density of the original network. Our model is based on the aggregation of spanning trees, and the estimation procedure on the expectation-maximization algorithm. We treat the graph structure and the unobserved nodes as missing variables and compute posterior probabilities of edge appearance. To provide a complete methodology, we also propose several model selection criteria to estimate the number of missing nodes. A simulation study and an illustration on flow cytometry data reveal that our method has favourable edge detection properties compared to existing graph inference techniques. The methods are implemented in an R package.


2019 ◽  
pp. 1-9 ◽  
Author(s):  
Jill de Ron ◽  
Eiko I. Fried ◽  
Sacha Epskamp

Abstract Background In clinical research, populations are often selected on the sum-score of diagnostic criteria such as symptoms. Estimating statistical models where a subset of the data is selected based on a function of the analyzed variables introduces Berkson's bias, which presents a potential threat to the validity of findings in the clinical literature. The aim of the present paper is to investigate the effect of Berkson's bias on the performance of the two most commonly used psychological network models: the Gaussian Graphical Model (GGM) for continuous and ordinal data, and the Ising Model for binary data. Methods In two simulation studies, we test how well the two models recover a true network structure when estimation is based on a subset of the data typically seen in clinical studies. The network is based on a dataset of 2807 patients diagnosed with major depression, and nodes in the network are items from the Hamilton Rating Scale for Depression (HRSD). The simulation studies test different scenarios by varying (1) sample size and (2) the cut-off value of the sum-score which governs the selection of participants. Results The results of both studies indicate that higher cut-off values are associated with worse recovery of the network structure. As expected from the Berkson's bias literature, selection reduced recovery rates by inducing negative connections between the items. Conclusion Our findings provide evidence that Berkson's bias is a considerable and underappreciated problem in the clinical network literature. Furthermore, we discuss potential solutions to circumvent Berkson's bias and their pitfalls.


Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 71 (2) ◽  
pp. 301-316
Author(s):  
Reshma Sanjhira

Abstract We propose a matrix analogue of a general inverse series relation with an objective to introduce the generalized Humbert matrix polynomial, Wilson matrix polynomial, and the Rach matrix polynomial together with their inverse series representations. The matrix polynomials of Kiney, Pincherle, Gegenbauer, Hahn, Meixner-Pollaczek etc. occur as the special cases. It is also shown that the general inverse matrix pair provides the extension to several inverse pairs due to John Riordan [An Introduction to Combinatorial Identities, Wiley, 1968].


Symmetry ◽  
2021 ◽  
Vol 13 (5) ◽  
pp. 870
Author(s):  
Diego Caratelli ◽  
Paolo Emilio Ricci

We show that using Dunford-Taylor’s integral, a classical tool of functional analysis, it is possible to derive an expression for the inverse of a general non-singular complex-valued tridiagonal matrix. The special cases of Jacobi’s symmetric and Toeplitz (in particular symmetric Toeplitz) matrices are included. The proposed method does not require the knowledge of the matrix eigenvalues and relies only on the relevant invariants which are determined, in a computationally effective way, by means of a dedicated recursive procedure. The considered technique has been validated through several test cases with the aid of the computer algebra program Mathematica©.


Sign in / Sign up

Export Citation Format

Share Document