Long-tailed graphical model and frequentist inference of the model parameters for biological networks

2020 ◽  
Vol 90 (9) ◽  
pp. 1591-1605 ◽  
Author(s):  
Melih Ağraz ◽  
Vilda Purutçuoğlu
Biostatistics ◽  
2018 ◽  
Author(s):  
Lin Zhang ◽  
Dipankar Bandyopadhyay

SummaryEpidemiological studies on periodontal disease (PD) collect relevant bio-markers, such as the clinical attachment level (CAL) and the probed pocket depth (PPD), at pre-specified tooth sites clustered within a subject’s mouth, along with various other demographic and biological risk factors. Routine cross-sectional evaluation are conducted under a linear mixed model (LMM) framework with underlying normality assumptions on the random terms. However, a careful investigation reveals considerable non-normality manifested in those random terms, in the form of skewness and tail behavior. In addition, PD progression is hypothesized to be spatially-referenced, i.e. disease status at proximal tooth-sites may be different from distally located sites, and tooth missingness is non-random (or informative), given that the number and location of missing teeth informs about the periodontal health in that region. To mitigate these complexities, we consider a matrix-variate skew-$t$ formulation of the LMM with a Markov graphical embedding to handle the site-level spatial associations of the bivariate (PPD and CAL) responses. Within the same framework, the non-randomly missing responses are imputed via a latent probit regression of the missingness indicator over the responses. Our hierarchical Bayesian framework powered by relevant Markov chain Monte Carlo steps addresses the aforementioned complexities within an unified paradigm, and estimates model parameters with seamless sharing of information across various stages of the hierarchy. Using both synthetic and real clinical data assessing PD status, we demonstrate a significantly improved fit of our proposition over various other alternative models.


2020 ◽  
Vol 375 (1796) ◽  
pp. 20190661 ◽  
Author(s):  
Danilo Bzdok ◽  
Dorothea L. Floris ◽  
Andre F. Marquand

Network connectivity fingerprints are among today's best choices to obtain a faithful sampling of an individual's brain and cognition. Widely available MRI scanners can provide rich information tapping into network recruitment and reconfiguration that now scales to hundreds and thousands of humans. Here, we contemplate the advantages of analysing such connectome profiles using Bayesian strategies. These analysis techniques afford full probability estimates of the studied network coupling phenomena, provide analytical machinery to separate epistemological uncertainty and biological variability in a coherent manner, usher us towards avenues to go beyond binary statements on existence versus non-existence of an effect, and afford credibility estimates around all model parameters at play which thus enable single-subject predictions with rigorous uncertainty intervals. We illustrate the brittle boundary between healthy and diseased brain circuits by autism spectrum disorder as a recurring theme where, we argue, network-based approaches in neuroscience will require careful probabilistic answers. This article is part of the theme issue ‘Unifying the essential concepts of biological networks: biological insights and philosophical foundations’.


2010 ◽  
Vol 58 (3) ◽  
pp. 393-401 ◽  
Author(s):  
R. Kruse ◽  
M. Steinbrecher

Visual data analysis with computational intelligence methodsVisual data analysis is an appealing and increasing field of application. We present two related visual analysis approaches that allow for the visualization of graphical model parameters and time-dependent association rules. When the graphical model is defined over purely nominal attributes, its local structure can be interpreted as an association rule. Such association rules comprise one of the most prominent and wide-spread analysis techniques for pattern detection, however, there are only few visualization methods. We introduce an alternative visual representation that also incorporates time since patterns are likely to change over time when the underlying data was collected from real-world processes. We apply the technique to both an artificial and a complex real-life dataset and show that the combined automatic and visual approach gives more and faster insight into the data than a fully-automatic approach only. Thus, our proposed method is capable of reducing considerably the analysis time.


2016 ◽  
Vol 12 (2) ◽  
pp. e1004755 ◽  
Author(s):  
Ting Wang ◽  
Zhao Ren ◽  
Ying Ding ◽  
Zhou Fang ◽  
Zhe Sun ◽  
...  

2014 ◽  
Vol 12 (04) ◽  
pp. 1450018 ◽  
Author(s):  
Imhoi Koo ◽  
Sen Yao ◽  
Xiang Zhang ◽  
Seongho Kim

Gaussian graphical model (GGM)-based method, a key approach to reverse engineering biological networks, uses partial correlation to measure conditional dependence between two variables by controlling the contribution from other variables. After estimating partial correlation coefficients, one of the most critical processes in network construction is to control the false discovery rate (FDR) to assess the significant associations among variables. Various FDR methods have been proposed mainly for biomarker discovery, but it still remains unclear which FDR method performs better for network construction. Furthermore, there is no study to see the effect of the network structure on network construction. We selected the six FDR methods, the linear step-up procedure (BH95), the adaptive linear step-up procedure (BH00), Efron's local FDR (LFDR), Benjamini–Yekutieli's step-up procedure (BY01), Storey's q-value procedure (Storey01), and Storey–Taylor–Siegmund's adaptive step-up procedure (STS04), to evaluate their performances on network construction. We further considered two network structures, random and scale-free networks, to investigate their influence on network construction. Both simulated data and real experimental data suggest that STS04 provides the highest true positive rate (TPR) or F1 score, while BY01 has the highest positive predictive value (PPV) in network construction. In addition, no significant effect of the network structure is found on FDR methods.


Author(s):  
Shen Liu ◽  
Hongyan Liu

Tags have been adopted by many online services as a method to manage their online resources. Effective tagging benefits both users and firms. In real applications providing a user tagging mechanism, only a small portion of tags are usually provided by users. Therefore, an automatic tagging method, which can assign tags to different items automatically, is urgently needed. Previous works on automatic tagging focus on exploring the tagging behavior of users or the content information of items. In online service platforms, users frequently browse items related to their interests, which implies users’ judgment about the underlying features of items and is helpful for automatic tagging. Browsing-behavior records are much more plentiful compared with tagging behavior and easy to collect. However, existing studies about automatic tagging ignore this kind of information. To properly integrate both browsing behaviors and content information for automatic tagging, we propose a novel probabilistic graphical model and develop a new algorithm for the model parameter inference. We conduct thorough experiments on a real-world data set to evaluate and analyze the performance of our proposed method. The experimental results demonstrate that our approach achieves better performance than state-of-the-art automatic tagging methods. Summary of Contribution. In this paper, we study how to automatically assign tags to items in an e-commerce background. Our study is about how to perform item tagging for e-commerce and other online service providers so that consumers can easily find what they need and firms can manage their resources effectively. Specifically, we study if consumer browsing behavior can be utilized to perform the tagging task automatically, which can save efforts of both firms and consumers. Additionally, we transform the problem into how to find the most proper tags for items and propose a novel probabilistic graphical model to model the generation process of tags. Finally, we develop a variational inference algorithm to learn the model parameters, and the model shows superior performance over competing benchmark models. We believe this study contributes to machine learning techniques.


2016 ◽  
Vol 26 (04) ◽  
pp. 1650017 ◽  
Author(s):  
Farid Yaghouby ◽  
Bruce F. O’Hara ◽  
Sridhar Sunderam

The proportion, number of bouts, and mean bout duration of different vigilance states (Wake, NREM, REM) are useful indices of dynamics in experimental sleep research. These metrics are estimated by first scoring state, sometimes using an algorithm, based on electrophysiological measurements such as the electroencephalogram (EEG) and electromyogram (EMG), and computing their values from the score sequence. Isolated errors in the scores can lead to large discrepancies in the estimated sleep metrics. But most algorithms score sleep by classifying the state from EEG/EMG features independently in each time epoch without considering the dynamics across epochs, which could provide contextual information. The objective here is to improve estimation of sleep metrics by fitting a probabilistic dynamical model to mouse EEG/EMG data and then predicting the metrics from the model parameters. Hidden Markov models (HMMs) with multivariate Gaussian observations and Markov state transitions were fitted to unlabeled 24-h EEG/EMG feature time series from 20 mice to model transitions between the latent vigilance states; a similar model with unbiased transition probabilities served as a reference. Sleep metrics predicted from the HMM parameters did not deviate significantly from manual estimates except for rapid eye movement sleep (REM) ([Formula: see text]; Wilcoxon signed-rank test). Changes in value from Light to Dark conditions correlated well with manually estimated differences (Spearman’s rho 0.43–0.84) except for REM. HMMs also scored vigilance state with over 90% accuracy. HMMs of EEG/EMG features can therefore characterize sleep dynamics from EEG/EMG measurements, a prerequisite for characterizing the effects of perturbation in sleep monitoring and control applications.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Camilla Lingjærde ◽  
Tonje G. Lien ◽  
Ørnulf Borgan ◽  
Helga Bergholtz ◽  
Ingrid K. Glad

Abstract Background Identifying gene interactions is a topic of great importance in genomics, and approaches based on network models provide a powerful tool for studying these. Assuming a Gaussian graphical model, a gene association network may be estimated from multiomic data based on the non-zero entries of the inverse covariance matrix. Inferring such biological networks is challenging because of the high dimensionality of the problem, making traditional estimators unsuitable. The graphical lasso is constructed for the estimation of sparse inverse covariance matrices in such situations, using $$L_1$$ L 1 -penalization on the matrix entries. The weighted graphical lasso is an extension in which prior biological information from other sources is integrated into the model. There are however issues with this approach, as it naïvely forces the prior information into the network estimation, even if it is misleading or does not agree with the data at hand. Further, if an associated network based on other data is used as the prior, the method often fails to utilize the information effectively. Results We propose a novel graphical lasso approach, the tailored graphical lasso, that aims to handle prior information of unknown accuracy more effectively. We provide an R package implementing the method, . Applying the method to both simulated and real multiomic data sets, we find that it outperforms the unweighted and weighted graphical lasso in terms of all performance measures we consider. In fact, the graphical lasso and weighted graphical lasso can be considered special cases of the tailored graphical lasso, and a parameter determined by the data measures the usefulness of the prior information. We also find that among a larger set of methods, the tailored graphical is the most suitable for network inference from high-dimensional data with prior information of unknown accuracy. With our method, mRNA data are demonstrated to provide highly useful prior information for protein–protein interaction networks. Conclusions The method we introduce utilizes useful prior information more effectively without involving any risk of loss of accuracy should the prior information be misleading.


Sign in / Sign up

Export Citation Format

Share Document