distributional regression
Recently Published Documents


TOTAL DOCUMENTS

56
(FIVE YEARS 40)

H-INDEX

7
(FIVE YEARS 2)

2021 ◽  
pp. 096228022110510
Author(s):  
Annika Strömer ◽  
Christian Staerk ◽  
Nadja Klein ◽  
Leonie Weinhold ◽  
Stephanie Titze ◽  
...  

We present a new procedure for enhanced variable selection for component-wise gradient boosting. Statistical boosting is a computational approach that emerged from machine learning, which allows to fit regression models in the presence of high-dimensional data. Furthermore, the algorithm can lead to data-driven variable selection. In practice, however, the final models typically tend to include too many variables in some situations. This occurs particularly for low-dimensional data ([Formula: see text]), where we observe a slow overfitting behavior of boosting. As a result, more variables get included into the final model without altering the prediction accuracy. Many of these false positives are incorporated with a small coefficient and therefore have a small impact, but lead to a larger model. We try to overcome this issue by giving the algorithm the chance to deselect base-learners with minor importance. We analyze the impact of the new approach on variable selection and prediction performance in comparison to alternative methods including boosting with earlier stopping as well as twin boosting. We illustrate our approach with data of an ongoing cohort study for chronic kidney disease patients, where the most influential predictors for the health-related quality of life measure are selected in a distributional regression approach based on beta regression.


2021 ◽  
pp. 271-296
Author(s):  
Paul Wiemann ◽  
Thomas Kneib ◽  
Helga Wagner

Author(s):  
Benedikt Schulz ◽  
Sebastian Lerch

AbstractPostprocessing ensemble weather predictions to correct systematic errors has become a standard practice in research and operations. However, only few recent studies have focused on ensemble postprocessing of wind gust forecasts, despite its importance for severe weather warnings. Here, we provide a comprehensive review and systematic comparison of eight statistical and machine learning methods for probabilistic wind gust forecasting via ensemble postprocessing, that can be divided in three groups: State of the art postprocessing techniques from statistics (ensemble model output statistics (EMOS), member-by-member postprocessing, isotonic distributional regression), established machine learning methods (gradient-boosting extended EMOS, quantile regression forests) and neural network-based approaches (distributional regression network, Bernstein quantile network, histogram estimation network). The methods are systematically compared using six years of data from a high-resolution, convection-permitting ensemble prediction system that was run operationally at the German weather service, and hourly observations at 175 surface weather stations in Germany. While all postprocessing methods yield calibrated forecasts and are able to correct the systematic errors of the raw ensemble predictions, incorporating information from additional meteorological predictor variables beyond wind gusts leads to significant improvements in forecast skill. In particular, we propose a flexible framework of locally adaptive neural networks with different probabilistic forecast types as output, which not only significantly outperform all benchmark postprocessing methods but also learn physically consistent relations associated with the diurnal cycle, especially the evening transition of the planetary boundary layer.


2021 ◽  
Vol 69 (3) ◽  
pp. 267-290
Author(s):  
Alexander Rauhut

Abstract Lexical ambiguity in the English language is abundant. Word-class ambiguity is even inherently tied to the productive process of conversion. Most lexemes are rather flexible when it comes to word class, which is facilitated by the minimal morphology that English has preserved. This study takes a multivariate quantitative approach to examine potential patterns that arise in a lexicon where verb-noun and noun-verb conversion are pervasive. The distributions of three inflectional suffixes, verbal -s, nominal -s, and -ed are explored for their interaction with degrees of verb-noun conversion. In order to achieve that, the lexical dispersion, context-dependency, and lexical similarity between the inflected and bare forms were taken into consideration and controlled for in a Generalized Additive Models for Location, Scale and Shape (GAMLSS; Stasinopoulos, M. D., R. A. Rigby, and F. De Bastiani. 2018. “GAMLSS: A Distributional Regression Approach.” Statistical Modelling 18 (3–4): 248–73). The results of a series of zero-one-inflated beta models suggest that there is a clear “uncanny” valley of lexemes that show similar proportions of verbal and nominal uses. Such lexemes have a lower proportion of inflectional uses when textual dispersion and context-dependency are controlled for. Furthermore, as soon as there is some degree of conversion, the probability that a lexeme is always encountered without inflection sharply rises. Disambiguation by means of inflection is unlikely to play a uniform role depending on the inflectional distribution of a lexeme.


PLoS ONE ◽  
2021 ◽  
Vol 16 (8) ◽  
pp. e0255073
Author(s):  
Given Moonga ◽  
Stephan Böse-O’Reilly ◽  
Ursula Berger ◽  
Kenneth Harttgen ◽  
Charles Michelo ◽  
...  

Background The burden of child under-nutrition still remains a global challenge, with greater severity being faced by low- and middle-income countries, despite the strategies in the Sustainable Development Goals (SDGs). Globally, malnutrition is the one of the most important risk factors associated with illness and death, affecting hundreds of millions of pregnant women and young children. Sub-Saharan Africa is one of the regions in the world struggling with the burden of chronic malnutrition. The 2018 Zambia Demographic and Health Survey (ZDHS) report estimated that 35% of the children under five years of age are stunted. The objective of this study was to analyse the distribution, and associated factors of stunting in Zambia. Methods We analysed the relationships between socio-economic, and remote sensed characteristics and anthropometric outcomes in under five children, using Bayesian distributional regression. Georeferenced data was available for 25,852 children from two waves of the ZDHS, 31% observation were from the 2007 and 69% were from the 2013/14. We assessed the linear, non-linear and spatial effects of covariates on the height-for-age z-score. Results Stunting decreased between 2007 and 2013/14 from a mean z-score of 1.59 (credible interval (CI): -1.63; -1.55) to -1.47 (CI: -1.49; -1.44). We found a strong non-linear relationship for the education of the mother and the wealth of the household on the height-for-age z-score. Moreover, increasing levels of maternal education above the eighth grade were associated with a reduced variation of stunting. Our study finds that remote sensed covariates alone explain little of the variation of the height-for-age z-score, which highlights the importance to collect socio-economic characteristics, and to control for socio-economic characteristics of the individual and the household. Conclusions While stunting still remains unacceptably high in Zambia with remarkable regional inequalities, the decline is lagging behind goal two of the SDGs. This emphasises the need for policies that help to reduce the share of chronic malnourished children within Zambia.


eLife ◽  
2021 ◽  
Vol 10 ◽  
Author(s):  
Timothy M Wolock ◽  
Seth Flaxman ◽  
Kathryn A Risher ◽  
Tawanda Dadirai ◽  
Simon Gregson ◽  
...  

The age dynamics of sexual partnership formation determine patterns of sexually transmitted disease transmission and have long been a focus of researchers studying human immunodeficiency virus. Data on self-reported sexual partner age distributions are available from a variety of sources. We sought to explore statistical models that accurately predict the distribution of sexual partner ages over age and sex. We identified which probability distributions and outcome specifications best captured variation in partner age and quantified the benefits of modelling these data using distributional regression. We found that distributional regression with a sinh-arcsinh distribution replicated observed partner age distributions most accurately across three geographically diverse data sets. This framework can be extended with well-known hierarchical modelling tools and can help improve estimates of sexual age-mixing dynamics.


Sign in / Sign up

Export Citation Format

Share Document