Non-Convex Penalized Estimation of Count Data Responses via Generalized Linear Model (GLM)

This study provided a non-convex penalized estimation procedure via Smoothed Clipped Absolute Deviation (SCAD) and Minimax Concave Penalty (MCP) for count data responses to checkmate the problem of covariates exceeding the sample size . The Generalized Linear Model (GLM) approach was adopted in obtaining the penalized functions needed by the MCP and SCAD non-convex penalizations of Binomial, Poisson and Negative-Binomial related count responses regression. A case study of the colorectal cancer with six (6) covariates against sample size of five (5) was subjected to the non-convex penalized estimation of the three distributions. It was revealed that the non-convex penalization of Binomial regression via MCP and SCAD best explained four un-penalized covariates needed in determining whether surgical or therapy ideal for treating the turmoil.

Download Full-text

A semiparametric negative binomial generalized linear model for modeling over-dispersed count data with a heavy tail: Characteristics and applications to crash data

Accident Analysis & Prevention ◽

10.1016/j.aap.2016.02.020 ◽

2016 ◽

Vol 91 ◽

pp. 10-18 ◽

Cited By ~ 19

Author(s):

Mohammadali Shirazi ◽

Dominique Lord ◽

Soma Sekhar Dhavala ◽

Srinivas Reddy Geedipally

Keyword(s):

Linear Model ◽

Count Data ◽

Generalized Linear Model ◽

Negative Binomial ◽

Heavy Tail ◽

Crash Data

Download Full-text

Generalized Linear Model Approach for Time Series Count Data on Number of Foreign Tourists Modeling in West Java

Journal of Physics Conference Series ◽

10.1088/1742-6596/1863/1/012025 ◽

2021 ◽

Vol 1863 (1) ◽

pp. 012025

Author(s):

E Rohaeti ◽

A Kurnia ◽

K Sadik ◽

I M Sumertajaya

Keyword(s):

Time Series ◽

Linear Model ◽

Count Data ◽

Generalized Linear Model ◽

West Java ◽

Model Approach

Download Full-text

Statistical models for analyzing count data: predictors of length of stay among HIV patients in Portugal using a multilevel model

BMC Health Services Research ◽

10.1186/s12913-021-06389-1 ◽

2021 ◽

Vol 21 (1) ◽

Author(s):

Ahmed Nabil Shaaban ◽

Bárbara Peleteiro ◽

Maria Rosario O. Martins

Keyword(s):

Length Of Stay ◽

Regression Model ◽

Random Effects ◽

Count Data ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Comprehensive Approach ◽

Negative Binomial Regression Model ◽

Hiv Patients ◽

Binomial Regression

Abstract Background This study offers a comprehensive approach to precisely analyze the complexly distributed length of stay among HIV admissions in Portugal. Objective To provide an illustration of statistical techniques for analysing count data using longitudinal predictors of length of stay among HIV hospitalizations in Portugal. Method Registered discharges in the Portuguese National Health Service (NHS) facilities Between January 2009 and December 2017, a total of 26,505 classified under Major Diagnostic Category (MDC) created for patients with HIV infection, with HIV/AIDS as a main or secondary cause of admission, were used to predict length of stay among HIV hospitalizations in Portugal. Several strategies were applied to select the best count fit model that includes the Poisson regression model, zero-inflated Poisson, the negative binomial regression model, and zero-inflated negative binomial regression model. A random hospital effects term has been incorporated into the negative binomial model to examine the dependence between observations within the same hospital. A multivariable analysis has been performed to assess the effect of covariates on length of stay. Results The median length of stay in our study was 11 days (interquartile range: 6–22). Statistical comparisons among the count models revealed that the random-effects negative binomial models provided the best fit with observed data. Admissions among males or admissions associated with TB infection, pneumocystis, cytomegalovirus, candidiasis, toxoplasmosis, or mycobacterium disease exhibit a highly significant increase in length of stay. Perfect trends were observed in which a higher number of diagnoses or procedures lead to significantly higher length of stay. The random-effects term included in our model and refers to unexplained factors specific to each hospital revealed obvious differences in quality among the hospitals included in our study. Conclusions This study provides a comprehensive approach to address unique problems associated with the prediction of length of stay among HIV patients in Portugal.

Download Full-text

Mixed Generalized Linear Model for Estimating Household Trip Production

Transportation Research Record Journal of the Transportation Research Board ◽

10.3141/1718-08 ◽

2000 ◽

Vol 1718 (1) ◽

pp. 61-67 ◽

Cited By ~ 2

Author(s):

Chang-Jen Lan ◽

Patricia S. Hu

Keyword(s):

Linear Model ◽

Generalized Linear Model ◽

Negative Binomial ◽

Empirical Distribution ◽

Distribution Functions ◽

Model Specification ◽

Parameter Estimates ◽

Modeling Framework ◽

Personal Transportation ◽

Trip Production

An innovative modeling framework to estimate household trip rates using 1995 Nationwide Personal Transportation Survey data is presented. A generalized linear model with a mixture of negative binomial probability distribution functions was developed on the basis of characteristics observed from the empirical distribution of household daily trips. This model provides a more flexible framework and a better model specification for analyzing household-specific trip production behavior. Compared with traditional least squares-based regression models, the parameter estimates from the proposed model are more efficient. Although the mean accuracies from the two modeling approaches are comparable, the mixed generalized linear model is more robust in identifying outliers due to its unsymmetric prediction bounds derived from more correct model specification.

Download Full-text

Managing Inflation

Crime & Delinquency ◽

10.1177/0011128716679796 ◽

2016 ◽

Vol 63 (1) ◽

pp. 77-87 ◽

Cited By ~ 5

Author(s):

William H. Fisher ◽

Stephanie W. Hartwell ◽

Xiaogang Deng

Keyword(s):

Count Data ◽

Regression Models ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Excess Zeros ◽

Binomial Regression ◽

Negative Binomial Models ◽

Using Data ◽

Over Dispersion ◽

Binomial Models

Poisson and negative binomial regression procedures have proliferated, and now are available in virtually all statistical packages. Along with the regression procedures themselves are procedures for addressing issues related to the over-dispersion and excessive zeros commonly observed in count data. These approaches, zero-inflated Poisson and zero-inflated negative binomial models, use logit or probit models for the “excess” zeros and count regression models for the counted data. Although these models are often appropriate on statistical grounds, their interpretation may prove substantively difficult. This article explores this dilemma, using data from a study of individuals released from facilities maintained by the Massachusetts Department of Correction.

Download Full-text

Sample Size Estimation for Negative Binomial Regression Comparing Rates of Recurrent Events with Unequal Follow-Up Time

Journal of Biopharmaceutical Statistics ◽

10.1080/10543406.2014.971167 ◽

2014 ◽

Vol 25 (5) ◽

pp. 1100-1113 ◽

Cited By ~ 15

Author(s):

Yongqiang Tang

Keyword(s):

Sample Size ◽

Recurrent Events ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Sample Size Estimation ◽

Size Estimation ◽

Binomial Regression

Download Full-text

A Combined PLS and Negative Binomial Regression Model for Inferring Association Networks from Next-Generation Sequencing Count Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics ◽

10.1109/tcbb.2017.2665495 ◽

2018 ◽

Vol 15 (3) ◽

pp. 760-773 ◽

Cited By ~ 2

Author(s):

Maiju Pesonen ◽

Jaakko Nevalainen ◽

Steven Potter ◽

Somnath Datta ◽

Susmita Datta

Keyword(s):

Next Generation Sequencing ◽

Regression Model ◽

Count Data ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Negative Binomial Regression Model ◽

Next Generation ◽

Binomial Regression ◽

Generation Sequencing

Download Full-text

PEMODELAN DENGAN GEOGRAPHICALLY WEIGHTED NEGATIVE BINOMIAL REGRESSION (Studi kasus: Banyaknya Penderita Kusta di Jawa Barat)

Xplore Journal of Statistics ◽

10.29244/xplore.v10i3.833 ◽

2021 ◽

Vol 10 (3) ◽

pp. 226-236

Author(s):

Khusnul Khotimah ◽

Itasia Dina Sulvianti ◽

Pika Silvianti

Keyword(s):

Regression Model ◽

Count Data ◽

Poisson Regression ◽

Negative Binomial ◽

Negative Binomial Regression ◽

Kernel Weight ◽

Negative Binomial Regression Model ◽

West Java ◽

Binomial Regression ◽

Spatial Heterogenity

The number of leper in West Java is an example of the count data case. The analyzes commonly used in count data is Poisson regression. This research will determine the variables that influence the number of leper in West Java. The data used is the number of leper in West Java in 2019. This data has an overdispersion condition and spatial heterogenity. To handle overdispersion, the negative binomial regression model can be employed. While spatial heterogenity is overcome by adding adaptive bisquare kernel weight. This research resulted Geographically Weighted Negative Binomial Regression (GWNBR) with a weighting adaptive bisquare kernel classifies regency/city in West Java into ten groups based on the variables that sigfinicantly influence the number of leper. In general, the variable in the percentage of households with Clean and Healthy Behavior (PHBS) has a significant effect in all regency/city in West Java. Especially for Bogor Regency, Depok City, Bogor City, and Pangandaran Regency, the variable of the percentage of people poverty does not have a significant effect on the number leper.

Download Full-text

Poisson GSTAR model: spatial temporal modeling count data follow generalized linear model and count time series models

Journal of Physics Conference Series ◽

10.1088/1742-6596/1490/1/012010 ◽

2020 ◽

Vol 1490 ◽

pp. 012010

Author(s):

L P Wardhani ◽

Setiawan ◽

Suhartono ◽

H Kuswanto

Keyword(s):

Time Series ◽

Linear Model ◽

Count Data ◽

Generalized Linear Model ◽

Time Series Models ◽

Count Time Series ◽

Temporal Modeling

Download Full-text

ComBat-seq: batch effect adjustment for RNA-seq count data

NAR Genomics and Bioinformatics ◽

10.1093/nargab/lqaa078 ◽

2020 ◽

Vol 2 (3) ◽

Cited By ~ 2

Author(s):

Yuqing Zhang ◽

Giovanni Parmigiani ◽

W Evan Johnson

Keyword(s):

Differential Expression ◽

Count Data ◽

Statistical Power ◽

Negative Binomial ◽

Genomic Data ◽

Negative Binomial Regression ◽

Negative Binomial Regression Model ◽

Rna Seq ◽

Batch Effects ◽

Binomial Regression

Abstract The benefit of integrating batches of genomic data to increase statistical power is often hindered by batch effects, or unwanted variation in data caused by differences in technical factors across batches. It is therefore critical to effectively address batch effects in genomic data to overcome these challenges. Many existing methods for batch effects adjustment assume the data follow a continuous, bell-shaped Gaussian distribution. However in RNA-seq studies the data are typically skewed, over-dispersed counts, so this assumption is not appropriate and may lead to erroneous results. Negative binomial regression models have been used previously to better capture the properties of counts. We developed a batch correction method, ComBat-seq, using a negative binomial regression model that retains the integer nature of count data in RNA-seq studies, making the batch adjusted data compatible with common differential expression software packages that require integer counts. We show in realistic simulations that the ComBat-seq adjusted data results in better statistical power and control of false positives in differential expression compared to data adjusted by the other available methods. We further demonstrated in a real data example that ComBat-seq successfully removes batch effects and recovers the biological signal in the data.

Download Full-text