scholarly journals Interactively visualizing distributional regression models with distreg.vis

2021 ◽  
pp. 1471082X2110073
Author(s):  
Stanislaus Stadlmann ◽  
Thomas Kneib

A newly emerging field in statistics is distributional regression, where not only the mean but each parameter of a parametric response distribution can be modelled using a set of predictors. As an extension of generalized additive models, distributional regression utilizes the known link functions (log, logit, etc.), model terms (fixed, random, spatial, smooth, etc.) and available types of distributions but allows us to go well beyond the exponential family and to model potentially all distributional parameters. Due to this increase in model flexibility, the interpretation of covariate effects on the shape of the conditional response distribution, its moments and other features derived from this distribution is more challenging than with traditional mean-based methods. In particular, such quantities of interest often do not directly equate the modelled parameters but are rather a (potentially complex) combination of them. To ease the post-estimation model analysis, we propose a framework and subsequently feature an implementation in R for the visualization of Bayesian and frequentist distributional regression models fitted using the bamlss, gamlss and betareg R packages.

Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 299
Author(s):  
Jaime Pinilla ◽  
Miguel Negrín

The interrupted time series analysis is a quasi-experimental design used to evaluate the effectiveness of an intervention. Segmented linear regression models have been the most used models to carry out this analysis. However, they assume a linear trend that may not be appropriate in many situations. In this paper, we show how generalized additive models (GAMs), a non-parametric regression-based method, can be useful to accommodate nonlinear trends. An analysis with simulated data is carried out to assess the performance of both models. Data were simulated from linear and non-linear (quadratic and cubic) functions. The results of this analysis show how GAMs improve on segmented linear regression models when the trend is non-linear, but they also show a good performance when the trend is linear. A real-life application where the impact of the 2012 Spanish cost-sharing reforms on pharmaceutical prescription is also analyzed. Seasonality and an indicator variable for the stockpiling effect are included as explanatory variables. The segmented linear regression model shows good fit of the data. However, the GAM concludes that the hypothesis of linear trend is rejected. The estimated level shift is similar for both models but the cumulative absolute effect on the number of prescriptions is lower in GAM.


Author(s):  
Alexander Silbersdorff ◽  
Kai Sebastian Schneider

This study addresses the much-discussed issue of the relationship between health and income. In particular, it focuses on the relation between mental health and household income by using generalized additive models of location, scale and shape and thus employing a distributional perspective. Furthermore, this study aims to give guidelines to applied researchers interested in taking a distributional perspective on health inequalities. In our analysis we use cross-sectional data of the German socioeconomic Panel (SOEP). We find that when not only looking at the expected mental health score of an individual but also at other distributional aspects, like the risk of moderate and severe mental illness, that the relationship between income and mental health is much more pronounced. We thus show that taking a distributional perspective, can add to and indeed enrich the mostly mean-based assessment of existent health inequalities.


2021 ◽  
Author(s):  
Bailey Anderson ◽  
Louise Slater ◽  
Simon Dadson ◽  
Annalise Blum

<p>There is still limited quantitative understanding of the effects of tree cover and urbanisation on streamflow at large scales, making it difficult to generalize these relationships. We use the globally consistent European Space Agency (ESA) Climate Change Initiative (CCI) Global Land Cover dataset to estimate the relationships between streamflow, calculated as high (Q0.99), median (Q0.50), and low (Q0.01) flow quantiles, and urbanization or tree cover changes in 2865 catchments between the years 1992 through 2018. We apply three statistical modelling approaches and examine the consistencies and inconsistencies between them. First, we use distributional regression models -- generalized additive models for location, scale, and shape (GAMLSS) -- at each site and assess goodness-of-fit. Model fits suggested a strong association between land cover, especially urban area, and low and median flows at sites with statistically significant trends in streamflow. We then examine the sign of the distributional regression model coefficients to determine whether the inclusion of a land cover variable in the regression models results in a relative increase or decrease in flow, regardless of the overall direction of trends in streamflow. Finally, we use fixed effects panel regression models to estimate the average effect across all sites. Panel regression results suggested that a 1% increase in urban area corresponds to between a < 1% and 2.1% increase in streamflow for all quantiles. Results for the tree cover panel regression models were not significant. We highlight the value of statistical approaches for large-sample attribution of hydrological change, while cautioning that considerable variability exists across catchments and modelling approaches.</p>


Entropy ◽  
2021 ◽  
Vol 23 (4) ◽  
pp. 469
Author(s):  
Thiago G. Ramires ◽  
Luiz R. Nakamura ◽  
Ana J. Righetto ◽  
Renan J. Carvalho ◽  
Lucas A. Vieira ◽  
...  

This paper presents a discussion regarding regression models, especially those belonging to the location class. Our main motivation is that, with simple distributions having simple interpretations, in some cases, one gets better results than the ones obtained with overly complex distributions. For instance, with the reverse Gumbel (RG) distribution, it is possible to explain response variables by making use of the generalized additive models for location, scale, and shape (GAMLSS) framework, which allows the fitting of several parameters (characteristics) of the probabilistic distributions, like mean, mode, variance, and others. Three real data applications are used to compare several location models against the RG under the GAMLSS framework. The intention is to show that the use of a simple distribution (e.g., RG) based on a more sophisticated regression structure may be preferable than using a more complex location model.


2021 ◽  
Author(s):  
Drew Thomas

Media commentary has suggested that recent Black Lives Matter (BLM) protests, particularly riots, drove voters, particularly Hispanic voters, away from Democratic candidate Joe Biden in the 2020 US presidential election. I test these hypotheses with county-level regression models of 2016-to-2020 swing towards the Democratic presidential candidate, using the presence and intensity of BLM non-riot protests and riots as regressors, controlling for state and many background demographic factors (population density, household size, racial composition, etc.). The models (generalized additive models) that control most aggressively for background factors find small and positive associations between BLM protests and Democratic swing: counties with non-riot BLM protests swung more towards Joe Biden by 0.2 percentage points, and counties with BLM-associated riots swung more towards Joe Biden by (a statistically insignificant) 0.1 percentage points. The extra BLM-protest swing was not statistically significantly different in counties with relatively many Hispanic voting-age citizens, although it was weaker in counties with relatively many Asian voting-age citizens. Inasmuch as these results reflect causal impacts of BLM protests, the protests enhanced the Democratic swing but were probably not electorally decisive. My most elaborate model suggests that a lack of BLM protests in 2020 would have flipped only one state: Biden might have narrowly lost Arizona.


2018 ◽  
Vol 18 (3-4) ◽  
pp. 248-273 ◽  
Author(s):  
Mikis D Stasinopoulos ◽  
Robert A Rigby ◽  
Fernanda De Bastiani

Abstract: A tutorial of the generalized additive models for location, scale and shape (GAMLSS) is given here using two examples. GAMLSS is a general framework for performing regression analysis where not only the location (e.g., the mean) of the distribution but also the scale and shape of the distribution can be modelled by explanatory variables.


2017 ◽  
Vol 17 (1-2) ◽  
pp. 1-35 ◽  
Author(s):  
Sonja Greven ◽  
Fabian Scheipl

Researchers are increasingly interested in regression models for functional data. This article discusses a comprehensive framework for additive (mixed) models for functional responses and/or functional covariates based on the guiding principle of reframing functional regression in terms of corresponding models for scalar data, allowing the adaptation of a large body of existing methods for these novel tasks. The framework encompasses many existing as well as new models. It includes regression for ‘generalized’ functional data, mean regression, quantile regression as well as generalized additive models for location, shape and scale (GAMLSS) for functional data. It admits many flexible linear, smooth or interaction terms of scalar and functional covariates as well as (functional) random effects and allows flexible choices of bases—particularly splines and functional principal components—and corresponding penalties for each term. It covers functional data observed on common (dense) or curve-specific (sparse) grids. Penalized-likelihood-based and gradient-boosting-based inference for these models are implemented in R packages refund and FDboost , respectively. We also discuss identifiability and computational complexity for the functional regression models covered. A running example on a longitudinal multiple sclerosis imaging study serves to illustrate the flexibility and utility of the proposed model class. Reproducible code for this case study is made available online.


2021 ◽  
Vol 69 (3) ◽  
pp. 267-290
Author(s):  
Alexander Rauhut

Abstract Lexical ambiguity in the English language is abundant. Word-class ambiguity is even inherently tied to the productive process of conversion. Most lexemes are rather flexible when it comes to word class, which is facilitated by the minimal morphology that English has preserved. This study takes a multivariate quantitative approach to examine potential patterns that arise in a lexicon where verb-noun and noun-verb conversion are pervasive. The distributions of three inflectional suffixes, verbal -s, nominal -s, and -ed are explored for their interaction with degrees of verb-noun conversion. In order to achieve that, the lexical dispersion, context-dependency, and lexical similarity between the inflected and bare forms were taken into consideration and controlled for in a Generalized Additive Models for Location, Scale and Shape (GAMLSS; Stasinopoulos, M. D., R. A. Rigby, and F. De Bastiani. 2018. “GAMLSS: A Distributional Regression Approach.” Statistical Modelling 18 (3–4): 248–73). The results of a series of zero-one-inflated beta models suggest that there is a clear “uncanny” valley of lexemes that show similar proportions of verbal and nominal uses. Such lexemes have a lower proportion of inflectional uses when textual dispersion and context-dependency are controlled for. Furthermore, as soon as there is some degree of conversion, the probability that a lexeme is always encountered without inflection sharply rises. Disambiguation by means of inflection is unlikely to play a uniform role depending on the inflectional distribution of a lexeme.


2018 ◽  
Vol 18 (3-4) ◽  
pp. 219-247 ◽  
Author(s):  
Nikolaus Umlauf ◽  
Thomas Kneib

Abstract: Bayesian methods have become increasingly popular in the past two decades. With the constant rise of computational power, even very complex models can be estimated on virtually any modern computer. Moreover, interest has shifted from conditional mean models to probabilistic distributional models capturing location, scale, shape and other aspects of a response distribution, where covariate effects can have flexible forms, for example, linear, non-linear, spatial or random effects. This tutorial article discusses how to select models in the Bayesian distributional regression setting, how to monitor convergence of the Markov chains and how to use simulation-based inference also for quantities derived from the original model parametrization. We exemplify the workflow using daily weather data on (a) temperatures on Germany's highest mountain and (b) extreme values of precipitation for the whole of Germany.


Sign in / Sign up

Export Citation Format

Share Document