Component-based regularization of a multivariate GLM with a thematic partitioning of the explanatory variables

2018 ◽  
Vol 20 (1) ◽  
pp. 96-119 ◽  
Author(s):  
Xavier Bry ◽  
Catherine Trottier ◽  
Frédéric Mortier ◽  
Guillaume Cornu

We address component-based regularization of a multivariate generalized linear model (GLM). A vector of random responses [Formula: see text] is assumed to depend, through a GLM, on a set [Formula: see text] of explanatory variables, as well as on a set [Formula: see text] of additional covariates. [Formula: see text] is partitioned into [Formula: see text] conceptually homogenous variable groups [Formula: see text], viewed as explanatory themes. Variables in each [Formula: see text] are assumed many and redundant. Thus, generalized linear regression demands dimension reduction and regularization with respect to each [Formula: see text]. By contrast, variables in [Formula: see text] are assumed few and selected so as to demand no regularization. Regularization is performed searching each [Formula: see text] for an appropriate number of orthogonal components that both contribute to model [Formula: see text] and capture relevant structural information in [Formula: see text]. To estimate a single-theme model, we first propose an enhanced version of Supervised Component Generalized Linear Regression (SCGLR), based on a flexible measure of structural relevance of components, and able to deal with mixed-type explanatory variables. Then, to estimate the multiple-theme model, we develop an algorithm encapsulating this enhanced SCGLR: THEME-SCGLR. The method is tested on simulated data and then applied to rainforest data in order to model the abundance of tree species.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Daisuke Endo ◽  
Ryota Kobayashi ◽  
Ramon Bartolo ◽  
Bruno B. Averbeck ◽  
Yasuko Sugase-Miyamoto ◽  
...  

AbstractThe recent increase in reliable, simultaneous high channel count extracellular recordings is exciting for physiologists and theoreticians because it offers the possibility of reconstructing the underlying neuronal circuits. We recently presented a method of inferring this circuit connectivity from neuronal spike trains by applying the generalized linear model to cross-correlograms. Although the algorithm can do a good job of circuit reconstruction, the parameters need to be carefully tuned for each individual dataset. Here we present another method using a Convolutional Neural Network for Estimating synaptic Connectivity from spike trains. After adaptation to huge amounts of simulated data, this method robustly captures the specific feature of monosynaptic impact in a noisy cross-correlogram. There are no user-adjustable parameters. With this new method, we have constructed diagrams of neuronal circuits recorded in several cortical areas of monkeys.


2021 ◽  
Vol 10 (1) ◽  
Author(s):  
Nicolas Pröllochs ◽  
Dominik Bär ◽  
Stefan Feuerriegel

AbstractEmotions are regarded as a dominant driver of human behavior, and yet their role in online rumor diffusion is largely unexplored. In this study, we empirically study the extent to which emotions explain the diffusion of online rumors. We analyze a large-scale sample of 107,014 online rumors from Twitter, as well as their cascades. For each rumor, the embedded emotions were measured based on eight so-called basic emotions from Plutchik’s wheel of emotions (i.e., anticipation–surprise, anger–fear, trust–disgust, joy–sadness). We then estimated using a generalized linear regression model how emotions are associated with the spread of online rumors in terms of (1) cascade size, (2) cascade lifetime, and (3) structural virality. Our results suggest that rumors conveying anticipation, anger, and trust generate more reshares, spread over longer time horizons, and become more viral. In contrast, a smaller size, lifetime, and virality is found for surprise, fear, and disgust. We further study how the presence of 24 dyadic emotional interactions (i.e., feelings composed of two emotions) is associated with diffusion dynamics. Here, we find that rumors cascades with high degrees of aggressiveness are larger in size, longer-lived, and more viral. Altogether, emotions embedded in online rumors are important determinants of the spreading dynamics.


2020 ◽  
Vol 1 (4) ◽  
pp. 140-147
Author(s):  
Dastan Maulud ◽  
Adnan M. Abdulazeez

Perhaps one of the most common and comprehensive statistical and machine learning algorithms are linear regression. Linear regression is used to find a linear relationship between one or more predictors. The linear regression has two types: simple regression and multiple regression (MLR). This paper discusses various works by different researchers on linear regression and polynomial regression and compares their performance using the best approach to optimize prediction and precision. Almost all of the articles analyzed in this review is focused on datasets; in order to determine a model's efficiency, it must be correlated with the actual values obtained for the explanatory variables.


Author(s):  
Paolo Giudici

Several classes of computational and statistical methods for data mining are available. Each class can be parameterised so that models within the class differ in terms of such parameters (see, for instance, Giudici, 2003; Hastie et al., 2001; Han & Kamber, 2000; Hand et al., 2001; Witten & Frank, 1999): for example, the class of linear regression models, which differ in the number of explanatory variables; the class of Bayesian networks, which differ in the number of conditional dependencies (links in the graph); the class of tree models, which differ in the number of leaves; and the class multi-layer perceptrons, which differ in terms of the number of hidden strata and nodes. Once a class of models has been established the problem is to choose the “best” model from it.


Sign in / Sign up

Export Citation Format

Share Document