scholarly journals Gap-com: General model selection criterion for sparse undirected gene networks with non-trivial community structure

Author(s):  
Markku Kuismin ◽  
Fatemeh Dodangeh ◽  
Mikko J Sillanpää

Abstract We introduce a new model selection criterion for sparse complex gene network modeling where gene co-expression relationships are estimated from data. This is a novel formulation of the gap statistic and it can be used for the optimal choice of a regularization parameter in graphical models. Our criterion favors gene network structure which differs from a trivial gene interaction structure obtained totally at random. We call the criterion the gap-com statistic (gap community statistic). The idea of the gap-com statistic is to examine the difference between the observed and the expected counts of communities (clusters) where the expected counts are evaluated using either data permutations or reference graph (the Erdős-Rényi graph) resampling. The latter represents a trivial gene network structure determined by chance. We put emphasis on complex network inference because the structure of gene networks is usually non-trivial. For example, some of the genes can be clustered together or some genes can be hub genes. We evaluate the performance of the gap-com statistic in graphical model selection and compare its performance to some existing methods using simulated and real biological data example.

2019 ◽  
Author(s):  
Jie Zhou ◽  
Anne Hoen ◽  
Susan McRitchie ◽  
Wimal Pathmasiri ◽  
Juliette Madan ◽  
...  

SUMMARYIn light of the low signal-to-noise nature of many large biological data sets, we propose a novel method to identify the structure of association networks using a Gaussian graphical model combined with prior knowledge. Our algorithm includes the following two parts. In the first part we propose a model selection criterion called structural Bayesian information criterion (SBIC) in which the prior structure is modeled and incorporated into the Bayesian information criterion (BIC). It is shown that the popular extended BIC (EBIC) is a special case of SBIC. In second part we propose a two-step algorithm to construct the candidate model pool. The algorithm is data-driven and the prior structure is embedded into the candidate model automatically. Theoretical investigation shows that under some mild conditions SBIC is a consistent model selection criterion for the high-dimensional Gaussian graphical model. Simulation studies validate the superiority of the SBIC over the standard BIC and show the robustness to the model misspecification. Application to relative concentration data from infant feces collected from subjects enrolled in a large molecular epidemiologic cohort study validates that prior knowledge on metabolic pathway involvement is a statistically significant factor for the conditional dependence among metabolites. More importantly, new relationships among metabolites are identified through the proposed algorithm which can not be covered by conventional pathway analysis. Some of them have been widely recognized in the literature.


Forecasting ◽  
2021 ◽  
Vol 3 (1) ◽  
pp. 56-90
Author(s):  
Monica Defend ◽  
Aleksey Min ◽  
Lorenzo Portelli ◽  
Franz Ramsauer ◽  
Francesco Sandrini ◽  
...  

This article considers the estimation of Approximate Dynamic Factor Models with homoscedastic, cross-sectionally correlated errors for incomplete panel data. In contrast to existing estimation approaches, the presented estimation method comprises two expectation-maximization algorithms and uses conditional factor moments in closed form. To determine the unknown factor dimension and autoregressive order, we propose a two-step information-based model selection criterion. The performance of our estimation procedure and the model selection criterion is investigated within a Monte Carlo study. Finally, we apply the Approximate Dynamic Factor Model to real-economy vintage data to support investment decisions and risk management. For this purpose, an autoregressive model with the estimated factor span of the mixed-frequency data as exogenous variables maps the behavior of weekly S&P500 log-returns. We detect the main drivers of the index development and define two dynamic trading strategies resulting from prediction intervals for the subsequent returns.


Sign in / Sign up

Export Citation Format

Share Document