Count data as response variable.

Author(s):  
Donald L. J. Quicke ◽  
Buntika A. Butcher ◽  
Rachel A. Kruft Welton

Abstract This chapter is devoted specifically to count data for three reasons: (i) they are common in ecological studies (e.g. clutch sizes, numbers of fledglings from a nest, numbers of seeds per pod...); (ii) they are simple to collect and are therefore often the data collected by students (e.g. numbers of beetles in a pitfall trap, number of pollinator visits to flowers...); and (iii) they pose numerous issues that linear models with their normal error structure cannot deal with. Two studies will be examined with the response variable being counts, starting with one that nearly fits the ideals of a Poisson distribution well, the other less so. Example 1 deals with fledgling numbers in relation to clutch initiation date. The data are on the northern cardinal bird, Cardinalis cardinalis, and were collected to test the hypothesis that birds that start their clutches later may suffer higher pre-fledging offspring mortality. Example 2 focuses on pollinator flower visits in Passiflora speciosa in relation to flower size.

Author(s):  
Donald L. J. Quicke ◽  
Buntika A. Butcher ◽  
Rachel A. Kruft Welton

Abstract This chapter is devoted specifically to count data for three reasons: (i) they are common in ecological studies (e.g. clutch sizes, numbers of fledglings from a nest, numbers of seeds per pod...); (ii) they are simple to collect and are therefore often the data collected by students (e.g. numbers of beetles in a pitfall trap, number of pollinator visits to flowers...); and (iii) they pose numerous issues that linear models with their normal error structure cannot deal with. Two studies will be examined with the response variable being counts, starting with one that nearly fits the ideals of a Poisson distribution well, the other less so. Example 1 deals with fledgling numbers in relation to clutch initiation date. The data are on the northern cardinal bird, Cardinalis cardinalis, and were collected to test the hypothesis that birds that start their clutches later may suffer higher pre-fledging offspring mortality. Example 2 focuses on pollinator flower visits in Passiflora speciosa in relation to flower size.


Author(s):  
Donald Quicke ◽  
Buntika A. Butcher ◽  
Rachel Kruft Welton

Abstract This chapter employs generalized linear modelling using the function glm when we know that variances are not constant with one or more explanatory variables and/or we know that the errors cannot be normally distributed, for example, they may be binary data, or count data where negative values are impossible, or proportions which are constrained between 0 and 1. A glm seeks to determine how much of the variation in the response variable can be explained by each explanatory variable, and whether such relationships are statistically significant. The data for generalized linear models take the form of a continuous response variable and a combination of continuous and discrete explanatory variables.


Author(s):  
Lili Puspita Rahayu ◽  
Kusman Sadik ◽  
Indahwati Indahwati

Poisson distribution is one of discrete distribution that is often used in modeling of rare events. The data obtained in form of counts with non-negative integers. One of analysis that is used in modeling count data is Poisson regression. Deviation of assumption that often occurs in the Poisson regression is overdispersion. Cause of overdispersion is an excess zero probability on the response variable. Solving model that be used to overcome of overdispersion is zero-inflated Poisson (ZIP) regression. The research aimed to develop a study of overdispersion for Poisson and ZIP regression on some characteristics of the data. Overdispersion on some characteristics of the data that were studied in this research are simulated by combining the parameter of Poisson distribution (λ), zero probability (p), and sample size (n) on the response variable then comparing the Poisson and ZIP regression models. Overdispersion study on data simulation showed that the larger λ, n, and p, the better is the model of ZIP than Poisson regression. The results of this simulation are also strengthened by the exploration of Pearson residual in Poisson and ZIP regression.


Author(s):  
Constantin Ahlmann-Eltze ◽  
Wolfgang Huber

Abstract Motivation The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts (Grün et al., 2014; Svensson, 2020; Silverman et al., 2018; Hafemeister and Satija, 2019) and an essential building block for analysis approaches including differential expression analysis (Robinson et al., 2010; McCarthy et al., 2012; Anders and Huber, 2010; Love et al., 2014), principal component analysis (Townes et al., 2019) and factor analysis (Risso et al., 2018). Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation. Results We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously. Availability The package glmGamPoi is available from Bioconductor for Windows, macOS, and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license.


Author(s):  
Constantin Ahlmann-Eltze ◽  
Wolfgang Huber

AbstractMotivationThe Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts (Grün et al., 2014; Townes et al., 2019; Svensson, 2020; Silverman et al., 2018; Hafemeister and Satija, 2019) and an essential building block for analysis approaches including differential expression analysis (Robinson et al., 2010; McCarthy et al., 2012; Anders and Huber, 2010; Love et al., 2014), principal component analysis (Townes et al., 2019) and factor analysis (Risso et al., 2018). Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which typically comprise thousands or millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation.ResultsWe present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously.AvailabilityThe package glmGamPoi is available from Bioconductor (since release 3.11) for Windows, macOS, and Linux, and source code is available on GitHub under a GPL-3 license. The scripts to reproduce the results of this paper are available on GitHub as [email protected]


2021 ◽  
Vol 95 ◽  
Author(s):  
D. Rubel ◽  
N. Flaibani

Abstract The aim of this study was to explore through cross-sectional study the variation in the prevalence of parasitic helminths in canine faeces collected from green spaces of Buenos Aires according to the human density (HD) and economic level (EL) in the surroundings. HD and EL were considered as independent variables with three categories each. Twenty public squares (one hectare of surface) were randomly selected for each existing combination of the two independent variables. Ten random samples of fresh canine faeces were obtained in each square and analysed for helminths by the sedimentation and flotation techniques. The prevalence for each of the species was analysed using generalized linear models (GLM). The prevalence was modelled with a binomial error distribution and a logit link function. Helminth eggs were detected in 45 out of the 200 (22.5%) faecal samples collected and in 18 of the 20 green spaces sampled. The species observed were Ancylostoma caninum (13% of samples), Trichuris vulpis (8%) and Toxocara canis (4.5%). The GLM indicated that the prevalence of A. caninum in the slum areas (very high HD and very low EL) was higher than that in the other areas studied. However, the HD seemed to contribute more than the EL to the variations in the prevalence of A. caninum in faecal samples. The GLM showed no differences in the prevalence of the other parasite species for the different levels of the independent variables.


2021 ◽  
Vol 1 (1) ◽  
pp. 99-112
Author(s):  
Richard Larouche ◽  
Nimesh Patel ◽  
Jennifer L. Copeland

The role of infrastructure in encouraging transportation cycling in smaller cities with a low prevalence of cycling remains unclear. To investigate the relationship between the presence of infrastructure and transportation cycling in a small city (Lethbridge, AB, Canada), we interviewed 246 adults along a recently-constructed bicycle boulevard and two comparison streets with no recent changes in cycling infrastructure. One comparison street had a separate multi-use path and the other had no cycling infrastructure. Questions addressed time spent cycling in the past week and 2 years prior and potential socio-demographic and psychosocial correlates of cycling, including safety concerns. Finally, we asked participants what could be done to make cycling safer and more attractive. We examined predictors of cycling using gender-stratified generalized linear models. Women interviewed along the street with a separate path reported cycling more than women on the other streets. A more favorable attitude towards cycling and greater habit strength were associated with more cycling in both men and women. Qualitative data revealed generally positive views about the bicycle boulevard, a need for education about sharing the road and for better cycling infrastructure in general. Our results suggest that, even in smaller cities, cycling infrastructure may encourage cycling, especially among women.


Author(s):  
Dexter Cahoy ◽  
Elvira Di Nardo ◽  
Federico Polito

AbstractWithin the framework of probability models for overdispersed count data, we propose the generalized fractional Poisson distribution (gfPd), which is a natural generalization of the fractional Poisson distribution (fPd), and the standard Poisson distribution. We derive some properties of gfPd and more specifically we study moments, limiting behavior and other features of fPd. The skewness suggests that fPd can be left-skewed, right-skewed or symmetric; this makes the model flexible and appealing in practice. We apply the model to real big count data and estimate the model parameters using maximum likelihood. Then, we turn to the very general class of weighted Poisson distributions (WPD’s) to allow both overdispersion and underdispersion. Similarly to Kemp’s generalized hypergeometric probability distribution, which is based on hypergeometric functions, we analyze a class of WPD’s related to a generalization of Mittag–Leffler functions. The proposed class of distributions includes the well-known COM-Poisson and the hyper-Poisson models. We characterize conditions on the parameters allowing for overdispersion and underdispersion, and analyze two special cases of interest which have not yet appeared in the literature.


Sign in / Sign up

Export Citation Format

Share Document