scholarly journals R-package netglm - generalized linear models for network data

2021 ◽  
Author(s):  
Timon Elmer

The netglm R-package estimates generalized linear models for network data based on the Multiple Regression Quadratic Assignment Procedure (MRQAP; Krackhardt, 1988). This package allows to investigate associations between characteristics of dyads in networks (e.g., the level of homophily between two actors) and a binary or continuous tie variable (e.g., friendship, amount of time spent together). One unique feature of this package is that it allows to estimate multi-group MRQAP models, where multiple networks are analyzed simultaneously (e.g., networks of multiple classrooms). Furthermore, parallel processing is implemented.

2018 ◽  
Author(s):  
Julián Candia ◽  
John S. Tsang

AbstractBackgroundRegularized generalized linear models (GLMs) are popular regression methods in bioinformatics, particularly useful in scenarios with fewer observations than parameters/features or when many of the features are correlated. In both ridge and lasso regularization, feature shrinkage is controlled by a penalty parameter λ. The elastic net introduces a mixing parameter α to tune the shrinkage continuously from ridge to lasso. Selecting α objectively and determining which features contributed significantly to prediction after model fitting remain a practical challenge given the paucity of available software to evaluate performance and statistical significance.ResultseNetXplorer builds on top of glmnet to address the above issues for linear (Gaussian), binomial (logistic), and multinomial GLMs. It provides new functionalities to empower practical applications by using a cross validation framework that assesses the predictive performance and statistical significance of a family of elastic net models (as α is varied) and of the corresponding features that contribute to prediction. The user can select which quality metrics to use to quantify the concordance between predicted and observed values, with defaults provided for each GLM. Statistical significance for each model (as defined by α) is determined based on comparison to a set of null models generated by random permutations of the response; the same permutation-based approach is used to evaluate the significance of individual features. In the analysis of large and complex biological datasets, such as transcriptomic and proteomic data, eNetXplorer provides summary statistics, output tables, and visualizations to help assess which subset(s) of features have predictive value for a set of response measurements, and to what extent those subset(s) of features can be expanded or reduced via regularization.ConclusionsThis package presents a framework and software for exploratory data analysis and visualization. By making regularized GLMs more accessible and interpretable, eNetXplorer guides the process to generate hypotheses based on features significantly associated with biological phenotypes of interest, e.g. to identify biomarkers for therapeutic responsiveness. eNetXplorer is also generally applicable to any research area that may benefit from predictive modeling and feature identification using regularized GLMs.Availability and implementationThe package is available under GPL-3 license at the CRAN repository, https://CRAN.R-project.org/package=eNetXplorer


2021 ◽  
Author(s):  
Connor McCabe ◽  
Max Andrew Halvorson ◽  
Kevin Michael King ◽  
Xiaolin Cao ◽  
Dale Sim Kim

Many researchers hope to examine interaction effects using generalized linear models (GLMs) to predict outcomes on nonlinear scales. For instance, logistic and Poisson GLMs are used to estimate associations between predictors and outcomes in nonlinear probability and count scales, respectively. However, we (McCabe et al., 2021; Halvorson et al., in press) and others (Ai & Norton, 2003; Mize, 2019; Norton, Wang, & Ai, 2004) have shown that testing and interpreting interaction effects on these scales is not straightforward. GLMs require the application of partial derivatives and/or discrete differences to compute and probe interaction effects appropriately when models are interpreted on their nonlinear scale. Currently available open-source software does not provide methods of computing these interaction effects on probability and count scales, reflecting a central limitation in applying these methods in research practice. Here, we introduce `modglm`, an R-based software package that accompanies our manuscript providing recommendations for computing interaction effects in nonlinear probability and counts (McCabe et al., 2021). This software produces the interaction effect between two variables in generalized linear models of probabilities and counts and provides additional statistics and plotting utilities for evaluating and describing this effect.


Author(s):  
Constantin Ahlmann-Eltze ◽  
Wolfgang Huber

Abstract Motivation The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts (Grün et al., 2014; Svensson, 2020; Silverman et al., 2018; Hafemeister and Satija, 2019) and an essential building block for analysis approaches including differential expression analysis (Robinson et al., 2010; McCarthy et al., 2012; Anders and Huber, 2010; Love et al., 2014), principal component analysis (Townes et al., 2019) and factor analysis (Risso et al., 2018). Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation. Results We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously. Availability The package glmGamPoi is available from Bioconductor for Windows, macOS, and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license.


Author(s):  
Janet L. Peacock ◽  
Philip J. Peacock

Multiple variables per subject 394 Multifactorial methods: overview 396 Multifactorial methods: model selection 398 Multifactorial methods: challenges 400 Missing data 402 Generalized linear models 404 Multiple regression 406 Multiple regression: examples 408 Multiple regression and analysis of variance 412 Main effects and interactions 414 Linear and non-linear terms ...


Methodology ◽  
2006 ◽  
Vol 2 (1) ◽  
pp. 42-47 ◽  
Author(s):  
Bonne J. H. Zijlstra ◽  
Marijtje A. J. van Duijn ◽  
Tom A. B. Snijders

The p 2 model is a random effects model with covariates for the analysis of binary directed social network data coming from a single observation of a social network. Here, a multilevel variant of the p 2 model is proposed for the case of multiple observations of social networks, for example, in a sample of schools. The multilevel p 2 model defines an identical p 2 model for each independent observation of the social network, where parameters are allowed to vary across the multiple networks. The multilevel p 2 model is estimated with a Bayesian Markov Chain Monte Carlo (MCMC) algorithm that was implemented in free software for the statistical analysis of complete social network data, called StOCNET. The new model is illustrated with a study on the received practical support by Dutch high school pupils of different ethnic backgrounds.


Sign in / Sign up

Export Citation Format

Share Document