distributional regression Latest Research Papers

Deselection of base-learners for statistical boosting—with an application to distributional regression

Statistical Methods in Medical Research ◽

10.1177/09622802211051088 ◽

2021 ◽

pp. 096228022110510

Author(s):

Annika Strömer ◽

Christian Staerk ◽

Nadja Klein ◽

Leonie Weinhold ◽

Stephanie Titze ◽

...

Keyword(s):

Variable Selection ◽

Alternative Methods ◽

Gradient Boosting ◽

Minor Importance ◽

Related Quality ◽

Health Related ◽

Low Dimensional ◽

The Impact ◽

Distributional Regression

We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include too many variables in some situations. This occurs particularly for low-dimensional data ([Formula: see text]), where we observe a slow overfitting behavior of boosting. As a result, more variables get included into the final model without altering the prediction accuracy. Many of these false positives are incorporated with a small coefficient and therefore have a small impact, but lead to a larger model. We try to overcome this issue by giving the algorithm the chance to deselect base-learners with minor importance. We analyze the impact of the new approach on variable selection and prediction performance in comparison to alternative methods including boosting with earlier stopping as well as twin boosting. We illustrate our approach with data of an ongoing cohort study for chronic kidney disease patients, where the most influential predictors for the health-related quality of life measure are selected in a distributional regression approach based on beta regression.

Effect Selection and Regularization in Structured Additive Distributional Regression

10.1201/9781003089018-12 ◽

2021 ◽

pp. 271-296

Author(s):

Paul Wiemann ◽

Thomas Kneib ◽

Helga Wagner

Keyword(s):

Distributional Regression

Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison

Monthly Weather Review ◽

10.1175/mwr-d-21-0150.1 ◽

2021 ◽

Author(s):

Benedikt Schulz ◽

Sebastian Lerch

Keyword(s):

Machine Learning ◽

Systematic Errors ◽

Ensemble Prediction ◽

Wind Gust ◽

Learning Methods ◽

Ensemble Prediction System ◽

Systematic Comparison ◽

Machine Learning Methods ◽

Wind Gusts ◽

Distributional Regression

AbstractPostprocessing ensemble weather predictions to correct systematic errors has become a standard practice in research and operations. However, only few recent studies have focused on ensemble postprocessing of wind gust forecasts, despite its importance for severe weather warnings. Here, we provide a comprehensive review and systematic comparison of eight statistical and machine learning methods for probabilistic wind gust forecasting via ensemble postprocessing, that can be divided in three groups: State of the art postprocessing techniques from statistics (ensemble model output statistics (EMOS), member-by-member postprocessing, isotonic distributional regression), established machine learning methods (gradient-boosting extended EMOS, quantile regression forests) and neural network-based approaches (distributional regression network, Bernstein quantile network, histogram estimation network). The methods are systematically compared using six years of data from a high-resolution, convection-permitting ensemble prediction system that was run operationally at the German weather service, and hourly observations at 175 surface weather stations in Germany. While all postprocessing methods yield calibrated forecasts and are able to correct the systematic errors of the raw ensemble predictions, incorporating information from additional meteorological predictor variables beyond wind gusts leads to significant improvements in forecast skill. In particular, we propose a flexible framework of locally adaptive neural networks with different probabilistic forecast types as output, which not only significantly outperform all benchmark postprocessing methods but also learn physically consistent relations associated with the diurnal cycle, especially the evening transition of the planetary boundary layer.

Correcting for sample selection bias in Bayesian distributional regression models

Computational Statistics & Data Analysis ◽

10.1016/j.csda.2021.107382 ◽

2021 ◽

pp. 107382

Author(s):

Paul F.V. Wiemann ◽

Nadja Klein ◽

Thomas Kneib

Keyword(s):

Selection Bias ◽

Regression Models ◽

Sample Selection ◽

Sample Selection Bias ◽

Distributional Regression

Distributional Regression Forests Approach to Regional Frequency Analysis with Partial Duration Series

Water Resources Research ◽

10.1029/2021wr029909 ◽

2021 ◽

Author(s):

K. G. Kiran ◽

V. V. Srinivas

Keyword(s):

Frequency Analysis ◽

Regional Frequency Analysis ◽

Partial Duration Series ◽

Distributional Regression ◽

Regional Frequency

Exploring the Effect of Conversion on the Distribution of Inflectional Suffixes: A Multivariate Corpus Study

Zeitschrift für Anglistik und Amerikanistik ◽

10.1515/zaa-2021-2024 ◽

2021 ◽

Vol 69 (3) ◽

pp. 267-290

Author(s):

Alexander Rauhut

Keyword(s):

English Language ◽

Generalized Additive Models ◽

Statistical Modelling ◽

Additive Models ◽

Context Dependency ◽

Word Class ◽

Corpus Study ◽

Productive Process ◽

Lexical Similarity ◽

Distributional Regression

Abstract Lexical ambiguity in the English language is abundant. Word-class ambiguity is even inherently tied to the productive process of conversion. Most lexemes are rather flexible when it comes to word class, which is facilitated by the minimal morphology that English has preserved. This study takes a multivariate quantitative approach to examine potential patterns that arise in a lexicon where verb-noun and noun-verb conversion are pervasive. The distributions of three inflectional suffixes, verbal -s, nominal -s, and -ed are explored for their interaction with degrees of verb-noun conversion. In order to achieve that, the lexical dispersion, context-dependency, and lexical similarity between the inflected and bare forms were taken into consideration and controlled for in a Generalized Additive Models for Location, Scale and Shape (GAMLSS; Stasinopoulos, M. D., R. A. Rigby, and F. De Bastiani. 2018. “GAMLSS: A Distributional Regression Approach.” Statistical Modelling 18 (3–4): 248–73). The results of a series of zero-one-inflated beta models suggest that there is a clear “uncanny” valley of lexemes that show similar proportions of verbal and nominal uses. Such lexemes have a lower proportion of inflectional uses when textual dispersion and context-dependency are controlled for. Furthermore, as soon as there is some degree of conversion, the probability that a lexeme is always encountered without inflection sharply rises. Disambiguation by means of inflection is unlikely to play a uniform role depending on the inflectional distribution of a lexeme.

Isotonic distributional regression

Journal of the Royal Statistical Society Series B (Statistical Methodology) ◽

10.1111/rssb.12450 ◽

2021 ◽

Author(s):

Alexander Henzi ◽

Johanna F. Ziegel ◽

Tilmann Gneiting

Keyword(s):

Distributional Regression

Modelling chronic malnutrition in Zambia: A Bayesian distributional regression approach

PLoS ONE ◽

10.1371/journal.pone.0255073 ◽

2021 ◽

Vol 16 (8) ◽

pp. e0255073

Author(s):

Given Moonga ◽

Stephan Böse-O’Reilly ◽

Ursula Berger ◽

Kenneth Harttgen ◽

Charles Michelo ◽

...

Keyword(s):

Maternal Education ◽

Credible Interval ◽

Sub Saharan Africa ◽

Spatial Effects ◽

Z Score ◽

Chronic Malnutrition ◽

Under Five ◽

Non Linear ◽

Distributional Regression ◽

Height For Age

Background The burden of child under-nutrition still remains a global challenge, with greater severity being faced by low- and middle-income countries, despite the strategies in the Sustainable Development Goals (SDGs). Globally, malnutrition is the one of the most important risk factors associated with illness and death, affecting hundreds of millions of pregnant women and young children. Sub-Saharan Africa is one of the regions in the world struggling with the burden of chronic malnutrition. The 2018 Zambia Demographic and Health Survey (ZDHS) report estimated that 35% of the children under five years of age are stunted. The objective of this study was to analyse the distribution, and associated factors of stunting in Zambia. Methods We analysed the relationships between socio-economic, and remote sensed characteristics and anthropometric outcomes in under five children, using Bayesian distributional regression. Georeferenced data was available for 25,852 children from two waves of the ZDHS, 31% observation were from the 2007 and 69% were from the 2013/14. We assessed the linear, non-linear and spatial effects of covariates on the height-for-age z-score. Results Stunting decreased between 2007 and 2013/14 from a mean z-score of 1.59 (credible interval (CI): -1.63; -1.55) to -1.47 (CI: -1.49; -1.44). We found a strong non-linear relationship for the education of the mother and the wealth of the household on the height-for-age z-score. Moreover, increasing levels of maternal education above the eighth grade were associated with a reduced variation of stunting. Our study finds that remote sensed covariates alone explain little of the variation of the height-for-age z-score, which highlights the importance to collect socio-economic characteristics, and to control for socio-economic characteristics of the individual and the household. Conclusions While stunting still remains unacceptably high in Zambia with remarkable regional inequalities, the decline is lagging behind goal two of the SDGs. This emphasises the need for policies that help to reduce the share of chronic malnourished children within Zambia.

Rage Against the Mean – A Review of Distributional Regression Approaches

Econometrics and Statistics ◽

10.1016/j.ecosta.2021.07.006 ◽

2021 ◽

Author(s):

Thomas Kneib ◽

Alexander Silbersdorff ◽

Benjamin Säfken

Keyword(s):

The Mean ◽

Distributional Regression

Evaluating distributional regression strategies for modelling self-reported sexual age-mixing

eLife ◽

10.7554/elife.68318 ◽

2021 ◽

Vol 10 ◽

Author(s):

Timothy M Wolock ◽

Seth Flaxman ◽

Kathryn A Risher ◽

Tawanda Dadirai ◽

Simon Gregson ◽

...

Keyword(s):

Disease Transmission ◽

Sexual Partner ◽

Probability Distributions ◽

Transmitted Disease ◽

Data Sets ◽

Sexually Transmitted ◽

Partnership Formation ◽

Mixing Dynamics ◽

Diverse Data ◽

Distributional Regression

The age dynamics of sexual partnership formation determine patterns of sexually transmitted disease transmission and have long been a focus of researchers studying human immunodeficiency virus. Data on self-reported sexual partner age distributions are available from a variety of sources. We sought to explore statistical models that accurately predict the distribution of sexual partner ages over age and sex. We identified which probability distributions and outcome specifications best captured variation in partner age and quantified the benefits of modelling these data using distributional regression. We found that distributional regression with a sinh-arcsinh distribution replicated observed partner age distributions most accurately across three geographically diverse data sets. This framework can be extended with well-known hierarchical modelling tools and can help improve estimates of sexual age-mixing dynamics.

distributional regression
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Deselection of base-learners for statistical boosting—with an application to distributional regression

Effect Selection and Regularization in Structured Additive Distributional Regression

Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison

Correcting for sample selection bias in Bayesian distributional regression models

Distributional Regression Forests Approach to Regional Frequency Analysis with Partial Duration Series

Exploring the Effect of Conversion on the Distribution of Inflectional Suffixes: A Multivariate Corpus Study

Isotonic distributional regression

Modelling chronic malnutrition in Zambia: A Bayesian distributional regression approach

Rage Against the Mean – A Review of Distributional Regression Approaches

Evaluating distributional regression strategies for modelling self-reported sexual age-mixing

Export Citation Format

distributional regressionRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Deselection of base-learners for statistical boosting—with an application to distributional regression

Effect Selection and Regularization in Structured Additive Distributional Regression

Machine learning methods for postprocessing ensemble forecasts of wind gusts: A systematic comparison

Correcting for sample selection bias in Bayesian distributional regression models

Distributional Regression Forests Approach to Regional Frequency Analysis with Partial Duration Series

Exploring the Effect of Conversion on the Distribution of Inflectional Suffixes: A Multivariate Corpus Study

Isotonic distributional regression

Modelling chronic malnutrition in Zambia: A Bayesian distributional regression approach

Rage Against the Mean – A Review of Distributional Regression Approaches

Evaluating distributional regression strategies for modelling self-reported sexual age-mixing

distributional regression
Recently Published Documents