scholarly journals Quantile Generalized Additive Model a Robust Alternative to Generalized Additive Model

2021 ◽  
Vol 10 (1) ◽  
pp. 12-18
Author(s):  
Nwakuya Maureen Tobechukwu

Nonparametric regression is an approach used when the structure of the relationship between the response and the predictor variable is unknown. It tries to estimate the structure of this relationship since there is no predetermined form. The generalized additive model (GAM) and quantile generalized additive (QGAM) model provides an attractive framework for nonparametric regression. The QGAM focuses on the features of the response beyond the central tendency, while the GAM focuses on the mean response. The analysis was done using gam and qgam packages in R, using data set on live-births, fertility-rate and birth-rate, where, live-birth is the response with fertility-rate and birth-rate as the predictors. The spline basis function was used while selecting the smoothing parameter by marginal loss minimization technique. The result shows that the basis dimension used was sufficient. The QGAM results show the effect of the smooth functions on the response variable at 25th, 50th, 75th and 95th quantiles, while the GAM showed only the effect of the predictors on the mean response. The results also reveal that the QGAM have lower Akaike information criterion (AIC) and Generalized cross-validation (GVC) than the GAM, hence producing a better model. It was also observed that the QGAM and the GAM at the 50th quantile had the same R2adj(77%), meaning that both models were able to explain the same percentage of variation in the models, this we attribute to the fact that mean regression and median regression are approximately the same, hence the observation is in agreement with existing literature. The plots reveal that some of the residuals of the GAM were seen to fall outside the confidence band while in QGAM all the residuals fell within the confidence band producing a better smooth.

Author(s):  
Yousef-Awwad Daraghmi ◽  
Eman Yaser Daraghmi ◽  
Motaz Daadoo ◽  
Samer Alsaadi

<div>Smart energy requires accurate and effificient short-term electric load forecasting to enable effificient</div><div>energy management and active real-time power control. Forecasting accuracy is inflfluenced by the char</div><div>acteristics of electrical load particularly overdispersion, nonlinearity, autocorrelation and seasonal patterns.</div><div>Although several fundamental forecasting methods have been proposed, accurate and effificient forecasting</div><div>methods that can consider all electric load characteristics are still needed. Therefore, we propose a novel</div><div>model for short-term electric load forecasting. The model adopts the negative binomial additive models</div><div>(NBAM) for handling overdispersion and capturing the nonlinearity of electric load. To address the season</div><div>ality, the daily load pattern is classifified into high, moderate, and low seasons, and the autocorrelation of</div><div>load is modeled separately in each season. We also consider the effificiency of forecasting since the NBAM</div><div>captures the behavior of predictors by smooth functions that are estimated via a scoring algorithm which has</div><div>low computational demand. The proposed NBAM is applied to real-world data set from Jericho city, and its</div><div>accuracy and effificiency outperform those of the other models used in this context.</div>


2021 ◽  
Vol 12 ◽  
Author(s):  
Haiyan Zhu ◽  
Chenqiong Zhao ◽  
Peiwen Xiao ◽  
Songying Zhang

CapsuleWe designed a predictive reference model to evaluate how many stimulation cycles are needed for a patient to achieve an ideal live birth rate using assisted reproductive technology.ObjectiveTo develop a counseling tool for women who wish to undergo assisted reproductive technology (ART) treatment to predict the likelihood of live birth based on age and number of oocytes retrieved.MethodsThis was a 6-year population-based retrospective cohort analysis using individual patient ART data. Between 2012 and 2017, 17,948 women were analyzed from their single ovarian stimulation cycle until they had a live birth or had used all their embryos. All consecutive women between 20 and 49 years old undergoing their ovarian stimulation cycles for ART in our center were enrolled. The cumulative live birth rate (CLBR) was defined as the delivery of a live neonate born during fresh or subsequent frozen–thawed embryo transfer cycles. Only the first delivery was considered in the analysis. Binary logistic regression was performed to identify and adjust for factors known to affect the CLBR independently. A generalized additive model was used to build a predictive model of CLBR according to the woman’s age and the number of oocytes retrieved.ResultsAn evidenced-based counseling tool was created to predict the probability of an individual woman having a live birth, based on her age and the number of oocytes retrieved in ART cycles. The model was verified by 10 times 10-fold cross-validation using the preprocessed data, and 100 area under the curve (AUC) values for receiver operating characteristic (ROC) curves were obtained on the test set. The mean AUC value was 0.7394. Our model predicts different CLBRs ranging from nearly 90% to less than 20% for women aged 20–49 years with at least 22 oocytes retrieved. The CLBRs of women aged 20–28 years were very similar, nearly on one trend line with a certain number of oocytes retrieved. Differences in the CLBR began to appear by the age of 29 years; these increased gradually in women aged &gt;35 years.ConclusionA predictive model of the CLBR was designed to serve as a guide for physicians and for patients considering ART treatment. The number of oocytes needed to be retrieved to achieve a live birth depends on the woman’s age.


Author(s):  
Yuanchang Xie ◽  
Yunlong Zhang

Recent crash frequency studies have been based primarily on generalized linear models, in which a linear relationship is usually assumed between the logarithm of expected crash frequency and other explanatory variables. For some explanatory variables, such a linear assumption may be invalid. It is therefore worthwhile to investigate other forms of relationships. This paper introduces generalized additive models to model crash frequency. Generalized additive models use smooth functions of each explanatory variable and are very flexible in modeling nonlinear relationships. On the basis of an intersection crash frequency data set collected in Toronto, Canada, a negative binomial generalized additive model is compared with two negative binomial generalized linear models. The comparison results show that the negative binomial generalized additive model performs best for both the Akaike information criterion and the fitting and predicting performance.


1996 ◽  
Vol 35 (4I) ◽  
pp. 385-398 ◽  
Author(s):  
John C. Caldwell

The significance of the Asian fertility transition can hardly be overestimated. The relatively sanguine view of population growth expressed at the 1994 International Conference for Population and Development (ICPD) in Cairo was possible only because of the demographic events in Asia over the last 30 years. In 1965 Asian women were still bearing about six children. Even at current rates, today’s young women will give birth to half as many. This measure, namely the average number of live births over a reproductive lifetime, is called the total fertility rate. It has to be above 2— considerably above if mortality is still high—to achieve long-term population replacement. By 1995 East Asia, taken as a whole, exhibited a total fertility rate of 1.9. Elsewhere, Singapore was below long-term replacement, Thailand had just achieved it, and Sri Lanka was only a little above. The role of Asia in the global fertility transition is shown by estimates I made a few years ago for a World Bank Planning Meeting covering the first quarter of a century of the Asian transition [Caldwell (1993), p. 300]. Between 1965 and 1988 the world’s annual birth rate fell by 22 percent. In 1988 there would have been 40 million more births if there had been no decline from 1965 fertility levels. Of that total decline in the world’s births, almost 80 percent had been contributed by Asia, compared with only 10 percent by Latin America, nothing by Africa, and, unexpectedly, 10 percent by the high-income countries of the West. Indeed, 60 percent of the decline was produced by two countries, China and India, even though they constitute only 38 percent of the world’s population. They accounted, between them, for over threequarters of Asia’s fall in births.


2020 ◽  
Vol 72 (1) ◽  
Author(s):  
Chao Xiong ◽  
Claudia Stolle ◽  
Patrick Alken ◽  
Jan Rauberg

Abstract In this study, we have derived field-aligned currents (FACs) from magnetometers onboard the Defense Meteorological Satellite Project (DMSP) satellites. The magnetic latitude versus local time distribution of FACs from DMSP shows comparable dependences with previous findings on the intensity and orientation of interplanetary magnetic field (IMF) By and Bz components, which confirms the reliability of DMSP FAC data set. With simultaneous measurements of precipitating particles from DMSP, we further investigate the relation between large-scale FACs and precipitating particles. Our result shows that precipitation electron and ion fluxes both increase in magnitude and extend to lower latitude for enhanced southward IMF Bz, which is similar to the behavior of FACs. Under weak northward and southward Bz conditions, the locations of the R2 current maxima, at both dusk and dawn sides and in both hemispheres, are found to be close to the maxima of the particle energy fluxes; while for the same IMF conditions, R1 currents are displaced further to the respective particle flux peaks. Largest displacement (about 3.5°) is found between the downward R1 current and ion flux peak at the dawn side. Our results suggest that there exists systematic differences in locations of electron/ion precipitation and large-scale upward/downward FACs. As outlined by the statistical mean of these two parameters, the FAC peaks enclose the particle energy flux peaks in an auroral band at both dusk and dawn sides. Our comparisons also found that particle precipitation at dawn and dusk and in both hemispheres maximizes near the mean R2 current peaks. The particle precipitation flux maxima closer to the R1 current peaks are lower in magnitude. This is opposite to the known feature that R1 currents are on average stronger than R2 currents.


Author(s):  
Luigi Lombardo ◽  
Hakan Tanyas

AbstractGround motion scenarios exists for most of the seismically active areas around the globe. They essentially correspond to shaking level maps at given earthquake return times which are used as reference for the likely areas under threat from future ground displacements. Being landslides in seismically actively regions closely controlled by the ground motion, one would expect that landslide susceptibility maps should change as the ground motion patterns change in space and time. However, so far, statistically-based landslide susceptibility assessments have primarily been used as time-invariant.In other words, the vast majority of the statistical models does not include the temporal effect of the main trigger in future landslide scenarios. In this work, we present an approach aimed at filling this gap, bridging current practices in the seismological community to those in the geomorphological and statistical ones. More specifically, we select an earthquake-induced landslide inventory corresponding to the 1994 Northridge earthquake and build a Bayesian Generalized Additive Model of the binomial family, featuring common morphometric and thematic covariates as well as the Peak Ground Acceleration generated by the Northridge earthquake. Once each model component has been estimated, we have run 1000 simulations for each of the 217 possible ground motion scenarios for the study area. From each batch of 1000 simulations, we have estimated the mean and 95% Credible Interval to represent the mean susceptibility pattern under a specific earthquake scenario, together with its uncertainty level. Because each earthquake scenario has a specific return time, our simulations allow to incorporate the temporal dimension into any susceptibility model, therefore driving the results toward the definition of landslide hazard. Ultimately, we also share our results in vector format – a .mif file that can be easily converted into a common shapefile –. There, we report the mean (and uncertainty) susceptibility of each 1000 simulation batch for each of the 217 scenarios.


Risks ◽  
2021 ◽  
Vol 9 (3) ◽  
pp. 53
Author(s):  
Yves Staudt ◽  
Joël Wagner

For calculating non-life insurance premiums, actuaries traditionally rely on separate severity and frequency models using covariates to explain the claims loss exposure. In this paper, we focus on the claim severity. First, we build two reference models, a generalized linear model and a generalized additive model, relying on a log-normal distribution of the severity and including the most significant factors. Thereby, we relate the continuous variables to the response in a nonlinear way. In the second step, we tune two random forest models, one for the claim severity and one for the log-transformed claim severity, where the latter requires a transformation of the predicted results. We compare the prediction performance of the different models using the relative error, the root mean squared error and the goodness-of-lift statistics in combination with goodness-of-fit statistics. In our application, we rely on a dataset of a Swiss collision insurance portfolio covering the loss exposure of the period from 2011 to 2015, and including observations from 81 309 settled claims with a total amount of CHF 184 mio. In the analysis, we use the data from 2011 to 2014 for training and from 2015 for testing. Our results indicate that the use of a log-normal transformation of the severity is not leading to performance gains with random forests. However, random forests with a log-normal transformation are the favorite choice for explaining right-skewed claims. Finally, when considering all indicators, we conclude that the generalized additive model has the best overall performance.


2021 ◽  
Vol 4 (1) ◽  
Author(s):  
Sidney R. Lehky ◽  
Keiji Tanaka ◽  
Anne B. Sereno

AbstractWhen measuring sparseness in neural populations as an indicator of efficient coding, an implicit assumption is that each stimulus activates a different random set of neurons. In other words, population responses to different stimuli are, on average, uncorrelated. Here we examine neurophysiological data from four lobes of macaque monkey cortex, including V1, V2, MT, anterior inferotemporal cortex, lateral intraparietal cortex, the frontal eye fields, and perirhinal cortex, to determine how correlated population responses are. We call the mean correlation the pseudosparseness index, because high pseudosparseness can mimic statistical properties of sparseness without being authentically sparse. In every data set we find high levels of pseudosparseness ranging from 0.59–0.98, substantially greater than the value of 0.00 for authentic sparseness. This was true for synthetic and natural stimuli, as well as for single-electrode and multielectrode data. A model indicates that a key variable producing high pseudosparseness is the standard deviation of spontaneous activity across the population. Consistently high values of pseudosparseness in the data demand reconsideration of the sparse coding literature as well as consideration of the degree to which authentic sparseness provides a useful framework for understanding neural coding in the cortex.


Sign in / Sign up

Export Citation Format

Share Document