Accelerating Solution of Generalized Linear Models by Solving Normal Equation Using GPGPU on a Large Real-World Tall-Skinny Data Set

Author(s):  
Tran Van Sang ◽  
Ryosuke Kobayashi ◽  
Rie S. Yamaguchi ◽  
Toshiyuki Nakata
1991 ◽  
Vol 48 (4) ◽  
pp. 619-622 ◽  
Author(s):  
Chris D. Bajdik ◽  
David C. Schneider

Generalized linear models were used to investigate the sensitivity of paramater estimates to choice of the random error assumption in models of fisheries data. We examined models of fish yield from lakes as a function of (i) Ryder's morphoedaphic index, (ii) lake area, lake depth, and concentration of dissolved solids, and (iii) fishing effort. Models were fit using a normal, log-normal, gamma, or Poisson distribution to generate the random error. Plots of standardized Pearson residuals and standardized deviance residuals were used to evaluate the distributional assumptions. For each data set, observations were found to be consistent with several distributions; however, some distributions were shown to be clearly inappropriate. Inappropriate distributional assumptions produced substantially different parameter estimates. Generalized linear models allow a variety of distributional assumptions to be incorporated in a model, and thereby let us study their effects.


Author(s):  
Justin C. Touchon

Whether at the undergraduate, graduate, or post-graduate level, Applied Statistics with R: A Practical Guide for the Life Sciences teaches readers to properly analyze data in an efficient, accessible, plainspoken, frank, and occasionally humorous manner. Readers will come away with the knowledge of which analyses they should use and when they should use them, an important skill in an age when the statistical analyses used in the life-sciences are becoming increasingly advanced. This book uses the statistical language R, which is the choice of ecologists worldwide and is rapidly becoming the ‘go-to’ stats program throughout the life-sciences. Written around a single real-world dataset, Applied Statistics with R which encourages readers to become deeply familiar with an imperfect but realistic set of data, much like they themselves might collect. Early chapters are designed to teach basic data manipulation skills and build good habits in preparation for learning more advanced analyses. This approach also demonstrates the importance of viewing data through different lenses, facilitating an easy and natural progression from linear and generalized linear models through to mixed effects versions of those same analyses. Readers will also learn advanced plotting and data-wrangling techniques, and gain an introduction to writing their own functions. Applied Statistics with R is suitable for senior undergraduate and graduate students, professional researchers, and practitioners throughout the life-sciences, whether in the fields of ecology, evolution, environmental studies, or computational biology.


2013 ◽  
Vol 35 (1) ◽  
pp. 98
Author(s):  
Angela Radünz Lazzari

Air pollution is a risk factor for the population health. Its harmful effects on the population are observed even when the atmospheric pollutants are within the parameters set out in specific legislation, and they develop mainly through respiratory diseases. The aim of this study was to analyze the relationship between the concentrations of air pollutants and the incidence of respiratory diseases in the city of Porto Alegre, in 2005 and 2006. Applied multiple linear regression analysis, ordinal logistic regression and generalized linear models were used in the work. The results show good adjustment by the three techniques. The ordinal logistic regression detected only positive influence of air temperature and relative humidity in hospitalizations for respiratory diseases. Multiple linear regression related negatively hospitalizations with meteorological variables and positively with the particulate matter (PM10). The generalized linear model detected negative influence of meteorological variables and positive of pollutants, tropospheric ozone (O3) and PM10 in hospitalizations. Comparing the three statistical techniques to analyze the same data set, it can be concluded that all of them had a model with good fit to the data, but the technique of generalized linear models showed higher sensitivity in capturing the influence of pollutants, except in ordinal logistic regression and multiple linear regression.


Author(s):  
Wagner Hugo Bonat

Abstract: We present a general statistical modelling framework for handling multivariate mixed types of outcomes in the context of quantitative genetic analysis. The models are based on the multivariate covariance generalized linear models, where the matrix linear predictor is composed of an identity matrix combined with a relatedness matrix defined by a pedigree, representing the environmental and genetic components, respectively. We also propose a new index of heritability for non-Gaussian data. A case study on house sparrow (Passer domesticus) population with continuous, binomial and count outcomes is employed to motivate the new model. Simulation of multivariate marginal models is not trivial, thus we adapt the NORTA (Normal to anything) algorithm for simulation of multivariate covariance generalized linear models in the context of genetic data analysis. A simulation study is presented to assess the asymptotic properties of the estimating function estimators for the correlation between outcomes and the new heritability index parameters. The data set and R code are available in the supplementary material.


1995 ◽  
Vol 124 (1) ◽  
pp. 61-70 ◽  
Author(s):  
J. A. Woolliams ◽  
Z. W. Luo ◽  
B. Villanueva ◽  
D. Waddington ◽  
P. J. Broadbent ◽  
...  

SUMMARYData on ovulation rate and numbers of ova and transferable embryos recovered from superovulated cattle and sheep were analysed using generalized linear models, quasi-likelihood, restricted maximum likelihood (REML) and generalized linear mixed models (GLMMS). The data pertained to the operation of nucleus breeding schemes in cattle and the commercial application of embryo transfer in sheep.Results of the analyses showed that generalized linear models involving Poisson and Binomial distributions were inappropriate because of over-dispersion, and that analyses using quasi-likelihood to model negative binomial and β-binomial distributions were more suitable. Factors identified as important in determining the results in cattle were the number of previous superovulations (a higher proportion of transferable embryos were obtained in the initial flush compared to subsequent recoveries in two out of three sets of data), the donor (significant in all analyses with repeated recoveries) and its mate (significant in some analyses). In sheep, the use of pFSH or hMG for superovulation increased embryo yields above those obtained with PMSG + GnRH. Analyses of a further data set for sheep showed the effect of breed was ambiguous.The effects of donors and their mates were treated as random effects in analyses involving REML and GLMMS. Results showed that the repeatability of the number of transferable embryos produced per donor ranged between 0·13 and 0·23 in three sets of data and was significant in all cases. In these analyses the variance among mates was not significantly different from zero.The results of analyses were used to develop a random generator to simulate the numbers of ova and embryos recovered from a cow following superovulation. By sampling from negative binomial distributions where the scale factor used for each cow was a normally distributed deviate, distributions were obtained which had the same mean, variance and repeatability as those observed.


Author(s):  
Yuanchang Xie ◽  
Yunlong Zhang

Recent crash frequency studies have been based primarily on generalized linear models, in which a linear relationship is usually assumed between the logarithm of expected crash frequency and other explanatory variables. For some explanatory variables, such a linear assumption may be invalid. It is therefore worthwhile to investigate other forms of relationships. This paper introduces generalized additive models to model crash frequency. Generalized additive models use smooth functions of each explanatory variable and are very flexible in modeling nonlinear relationships. On the basis of an intersection crash frequency data set collected in Toronto, Canada, a negative binomial generalized additive model is compared with two negative binomial generalized linear models. The comparison results show that the negative binomial generalized additive model performs best for both the Akaike information criterion and the fitting and predicting performance.


2020 ◽  
Vol 02 ◽  
Author(s):  
RM Garcia ◽  
WF Vieira-Junior ◽  
JD Theobaldo ◽  
NIP Pini ◽  
GM Ambrosano ◽  
...  

Objective: To evaluate color and roughness of bovine enamel exposed to dentifrices, dental bleaching with 35% hydrogen peroxide (HP), and erosion/staining by red wine. Methods: Bovine enamel blocks were exposed to: artificial saliva (control), Oral-B Pro-Health (stannous fluoride with sodium fluoride, SF), Sensodyne Repair & Protect (bioactive glass, BG), Colgate Pro-Relief (arginine and calcium carbonate, AR), or Chitodent (chitosan, CHI). After toothpaste exposure, half (n=12) of the samples were bleached (35% HP), and the other half were not (n=12). The color (CIE L*a* b*, ΔE), surface roughness (Ra), and scanning electron microscopy were evaluated. Color and roughness were assessed at baseline, post-dentifrice and/or -dental bleaching, and after red wine. The data were subjected to analysis of variance (ANOVA) (ΔE) for repeated measures (Ra), followed by Tukey ́s test. The L*, a*, and b* values were analyzed by generalized linear models (a=0.05). Results: The HP promoted an increase in Ra values; however, the SF, BG, and AR did not enable this alteration. After red wine, all groups apart from SF (unbleached) showed increases in Ra values; SF and AR promoted decreases in L* values; AR demonstrated higher ΔE values, differing from the control; and CHI decreased the L* variation in the unbleached group. Conclusion: Dentifrices did not interfere with bleaching efficacy of 35% HP. However, dentifrices acted as a preventive agent against surface alteration from dental bleaching (BG, SF, and AR) or red wine (SF). Dentifrices can decrease (CHI) or increase (AR and SF) staining by red wine.


2020 ◽  
Vol 9 (16) ◽  
pp. 1105-1115
Author(s):  
Shuqing Wu ◽  
Xin Cui ◽  
Shaoyu Zhang ◽  
Wenqi Tian ◽  
Jiazhen Liu ◽  
...  

Aim: This real-world data study investigated the economic burden and associated factors of readmissions for cerebrospinal fluid leakage (CSFL) post-cranial, transsphenoidal, or spinal index surgeries. Methods: Costs of CSFL readmissions and index hospitalizations during 2014–2018 were collected. Readmission cost was measured as absolute cost and as percentage of index hospitalization cost. Factors associated with readmission cost were explored using generalized linear models. Results: Readmission cost averaged US$2407–6106, 35–94% of index hospitalization cost. Pharmacy costs were the leading contributor. Generalized linear models showed transsphenoidal index surgery and surgical treatment for CSFL were associated with higher readmission costs. Conclusion: CSFL readmissions are a significant economic burden in China. Factors associated with higher readmission cost should be monitored.


Sign in / Sign up

Export Citation Format

Share Document